Bucharest, Romania

BCDR Resilience Specialist

 Job Description:

Location: Remote
Contract Duration: 6 months (with possibility of extension)
Contract details: B2B/ PFA or SRL 


Role Overview

We are looking for a BCDR Resilience Specialist to support the assessment, design, and assurance of business continuity and disaster recovery capabilities for solution platforms, including AI-enabled services.

The role ensures that solutions demonstrate resilience under failure scenarios and meet governance, audit, and regulatory expectations prior to production deployment.

Key Responsibilities

Resilience & Availability Assurance

  • Assess and provide assurance that availability and resilience requirements are met across solution platforms.

  • Validate RTO/RPO definitions and ensure alignment with business criticality and service tiering.

  • Support solution teams in defining and validating appropriate recovery strategies.

Failure Analysis & Risk Reduction

  • Identify and address single points of failure (SPOFs) across:

    • application and platform components

    • AI agents and orchestration layers

    • data dependencies

    • third-party services

  • Assess architectural resilience and recommend mitigation strategies to improve fault tolerance and recovery.

BC/DR Planning & Runbooks

  • Review and ensure BC/DR plans and operational runbooks are fit for purpose.

  • Validate recovery procedures, including AI-specific failure modes, model dependencies, and recovery actions.

  • Support and participate in resilience testing, tabletop exercises, and recovery simulations.

Governance, Audit & Evidence

  • Provide credible resilience evidence to support:

    • governance approvals

    • audit readiness

    • regulatory requirements

  • Ensure resilience controls and recovery capabilities are documented, traceable, and defensible prior to production release.

Required Skills & Experience

Business Continuity & Disaster Recovery

  • Proven experience in BCDR, resilience engineering, or service continuity roles

  • Strong understanding of:

    • business impact analysis (BIA)

    • RTO / RPO definitions

    • high availability and disaster recovery strategies

  • Experience validating BC/DR readiness for production environments

Architecture & Platform Awareness

  • Ability to assess end-to-end solution architectures, including:

    • application and platform layers

    • data flows and dependencies

    • third-party integrations

  • Understanding of modern, distributed, and cloud-native architectures

AI & Emerging Technology Resilience

  • Awareness of AI-specific failure modes, such as:

    • model availability and dependency failures

    • orchestration and pipeline failures

    • data drift or data dependency issues

  • Ability to assess recovery strategies for AI-enabled components

Governance & Communication

  • Experience producing decision-ready documentation for governance, audit, and regulatory stakeholders

  • Strong communication skills, with the ability to articulate resilience risks and recovery capabilities clearly