Job Openings DevOps / Platform Engineer

About the job DevOps / Platform Engineer

DevOps / Platform Engineer – Healthcare SaaS (Azure, Python/Django)

Role Summary
We are seeking a senior, hands-on DevOps / Platform Engineer to own production readiness, reliability, security, multi-tenancy, and cost optimization for our Azure-based, AI-native healthcare platform.
This role is critical to ensuring the company operates as a secure, compliant, highly available SaaS platform as we onboard regulated healthcare customers and scale multi-tenant workloads. You will act as the platform owner, working closely with engineering, security, and leadership.

Key Responsibilities
Production Readiness & Reliability
Own production readiness across development, staging, and production environments
Design and implement:
Safe deployment strategies (blue-green, rolling, canary)
Automated rollbacks
Health checks, monitoring, and alerting
Operate and maintain high-availability, fault-tolerant systems
Lead incident response, root-cause analysis (RCA), and preventive remediation
Establish SLOs, SLIs, and operational runbooks
Multi-Tenancy & Scalability
Design and operate secure multi-tenant infrastructure for a healthcare SaaS platform
Implement tenant isolation across:Compute, Network, Data
Configuration and secrets
Enable tenant-aware deployments and customer onboarding
Ensure scalability without cross-tenant impact, data leakage, or performance degradation

Azure Cloud Infrastructure
Design, build, and operate Azure-first infrastructure, including:
AKS and containerized microservices
App Services, Azure Functions, and background workers
Azure SQL, Cosmos DB, Blob Storage
VNETs, private endpoints, NSGs, firewalls, and ingress controls
Manage infrastructure using Infrastructure as Code (Terraform preferred; Bicep/ARM acceptable)
Ensure environments are reproducible, auditable, and secure by default

Azure Security & Identity
Implement and manage Azure security and identity services:
Azure Entra ID (RBAC, managed identities, conditional access)
Microsoft Defender for Cloud
Azure Key Vault (secrets, keys, certificates)
Enforce least-privilege access, strong authentication, and audit logging
Support SOC 2 Type I & II and HIPAA-aligned security controls
Partner with security and engineering teams on threat modeling and compliance readiness

CI/CD & Release Engineering
Build and maintain CI/CD pipelines using GitHub Actions and/or Azure DevOps
Enable:
Zero-downtime deployments
Versioned APIs and backward compatibility
Environment-specific configuration and secrets
Improve release reliability, deployment speed, and developer productivity

Cost Optimization (FinOps)
Monitor and optimize Azure cloud spend across environments and tenants
Implement:
Budgets, alerts, and cost attribution
Environment-level and tenant-level cost visibility
Right-size compute, storage, and networking resources
Partner with engineering and leadership on cost forecasting and optimization

AI & Data Platform Enablement
Support AI-native workloads, including Azure OpenAI–based services
Operate document ingestion and event-driven pipelines (fax, PDFs, clinical data)
Ensure secure handling of PHI and regulated healthcare data across pipelines
Support scalable, resilient background processing and async workloads

Customer Onboarding & Integrations
Support production onboarding of new healthcare customers
Enable repeatable, automated deployment and go-live processes
Support integrations with payer platforms, EHRs, and external vendors
Act as a technical escalation point during customer launches

Qualifications
Required
7+ years of experience in DevOps, SRE, or Platform Engineering
Strong hands-on experience with Microsoft Azure
Experience operating production SaaS platforms
Deep experience with:
Kubernetes / AKS
Docker and containerized workloads
CI/CD pipelines
Infrastructure as Code (Terraform, Bicep, or ARM)
Experience designing and operating multi-tenant architectures
Strong understanding of cloud security, identity, and access management
Experience with regulated or compliance-driven environments (SOC 2, HIPAA, etc.)
Experience with cloud cost optimization / FinOps
Experience with Azure OpenAI or AI/ML platforms

Nice to Have
Experience supporting Python / Django production systems
Prior ownership of production incident management and reliability metrics