BCDR Resilience Specialist
Job Description:
Location: Remote
Contract Duration: 6 months (with possibility of extension)
Contract details: B2B/ PFA or SRL
Role Overview
We are looking for a BCDR Resilience Specialist to support the assessment, design, and assurance of business continuity and disaster recovery capabilities for solution platforms, including AI-enabled services.
The role ensures that solutions demonstrate resilience under failure scenarios and meet governance, audit, and regulatory expectations prior to production deployment.
Key Responsibilities
Resilience & Availability Assurance
-
Assess and provide assurance that availability and resilience requirements are met across solution platforms.
-
Validate RTO/RPO definitions and ensure alignment with business criticality and service tiering.
-
Support solution teams in defining and validating appropriate recovery strategies.
Failure Analysis & Risk Reduction
-
Identify and address single points of failure (SPOFs) across:
-
application and platform components
-
AI agents and orchestration layers
-
data dependencies
-
third-party services
-
-
Assess architectural resilience and recommend mitigation strategies to improve fault tolerance and recovery.
BC/DR Planning & Runbooks
-
Review and ensure BC/DR plans and operational runbooks are fit for purpose.
-
Validate recovery procedures, including AI-specific failure modes, model dependencies, and recovery actions.
-
Support and participate in resilience testing, tabletop exercises, and recovery simulations.
Governance, Audit & Evidence
-
Provide credible resilience evidence to support:
-
governance approvals
-
audit readiness
-
regulatory requirements
-
-
Ensure resilience controls and recovery capabilities are documented, traceable, and defensible prior to production release.
Required Skills & Experience
Business Continuity & Disaster Recovery
-
Proven experience in BCDR, resilience engineering, or service continuity roles
-
Strong understanding of:
-
business impact analysis (BIA)
-
RTO / RPO definitions
-
high availability and disaster recovery strategies
-
-
Experience validating BC/DR readiness for production environments
Architecture & Platform Awareness
-
Ability to assess end-to-end solution architectures, including:
-
application and platform layers
-
data flows and dependencies
-
third-party integrations
-
-
Understanding of modern, distributed, and cloud-native architectures
AI & Emerging Technology Resilience
-
Awareness of AI-specific failure modes, such as:
-
model availability and dependency failures
-
orchestration and pipeline failures
-
data drift or data dependency issues
-
-
Ability to assess recovery strategies for AI-enabled components
Governance & Communication
-
Experience producing decision-ready documentation for governance, audit, and regulatory stakeholders
-
Strong communication skills, with the ability to articulate resilience risks and recovery capabilities clearly