Job Openings
Senior Data Engineer
About the job Senior Data Engineer
Our client is the leading insights company, using market-leading data to drive better marketing decisions for global companies.
Job Responsibilities
SQL Databases
- Design and refine schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), incorporating partitioning, indexing, and row-level security for multi-tenant isolation.
- Define and manage data contracts and versioning, oversee schema evolution, and implement CDC and SCD patterns.
- Optimise performance through query tuning, resource configuration, caching strategies, and cost controls.
Data Pipelines
- Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerised workloads (Container Apps/AKS).
- Deliver reliable, observable pipelines (idempotent, retryable, lineage-aware) with clear SLAs/SLOs and operational runbooks.
- Implement CI/CD for data workloads (dbt/SQL projects, PySpark jobs, automated tests) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).
Data Enrichment
- Define and manage enrichment layers such as UPC/GS1, OCR/EXIF metadata, taxonomies, embeddings, and third-party data integrations.
- Curate gold/semantic data models for analytics and product APIs, including ownership of feature/metric definitions and documentation.
- Collaborate with Data Science/ML teams to productionise feature stores, model outputs, drift monitoring, and evaluation datasets.
Azure Architecture & Governance
- Own the reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB (including vector), Azure AI Search, Key Vault, and Purview.
- Embed security and compliance by default: encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC 2 aligned controls.
- Drive observability using OpenTelemetry and Azure Monitor/App Insights, plus data quality checks, freshness SLAs, and lineage via Purview.
Examples of What Youll Build
- A robust image ingestion and enrichment pipeline that validates assets, extracts OCR/UPC, computes embeddings, tracks lineage, and publishes search-ready views.
- A hybrid retrieval layer (vector + filters) across Cosmos DB and Azure AI Search to power similarity search and recommendations.
Minimum Qualifications
- Very strong Python and SQL skills (comfortable analysing complex query plans and working with both PySpark and pandas).
- 7+ years experience in data engineering/architecture with end-to-end ownership of production SQL databases and pipelines.
- Deep hands-on experience with Azure data services: ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, Key Vault.
- Solid background in data modeling (star/snowflake, Data Vault/Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).
- Proven track record implementing data quality frameworks, lineage, and performance/cost guardrails at scale.
- Strong understanding of multi-tenant SaaS architectures, security, and privacy (including core GDPR concepts).
Nice to Have
- Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to embedding pipelines for images/text.
- Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.
- Knowledge of SQL Server internals, PolyBase/Serverless SQL; familiarity with Postgres.
- Experience rolling out Purview, governance frameworks, and data product operating models.
Our Clients Technology Stack
- Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
- Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
- Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
- Observability: OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)