Job Openings Senior Data Engineer

About the job Senior Data Engineer

Our client is the leading insights company, using market-leading data to drive better marketing decisions for global companies.


Job Responsibilities

SQL Databases

  • Design and refine schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), incorporating partitioning, indexing, and row-level security for multi-tenant isolation.
  • Define and manage data contracts and versioning, oversee schema evolution, and implement CDC and SCD patterns.
  • Optimise performance through query tuning, resource configuration, caching strategies, and cost controls.

Data Pipelines

  • Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerised workloads (Container Apps/AKS).
  • Deliver reliable, observable pipelines (idempotent, retryable, lineage-aware) with clear SLAs/SLOs and operational runbooks.
  • Implement CI/CD for data workloads (dbt/SQL projects, PySpark jobs, automated tests) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).

Data Enrichment

  • Define and manage enrichment layers such as UPC/GS1, OCR/EXIF metadata, taxonomies, embeddings, and third-party data integrations.
  • Curate gold/semantic data models for analytics and product APIs, including ownership of feature/metric definitions and documentation.
  • Collaborate with Data Science/ML teams to productionise feature stores, model outputs, drift monitoring, and evaluation datasets.

Azure Architecture & Governance

  • Own the reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB (including vector), Azure AI Search, Key Vault, and Purview.
  • Embed security and compliance by default: encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC 2 aligned controls.
  • Drive observability using OpenTelemetry and Azure Monitor/App Insights, plus data quality checks, freshness SLAs, and lineage via Purview.

Examples of What Youll Build

  • A robust image ingestion and enrichment pipeline that validates assets, extracts OCR/UPC, computes embeddings, tracks lineage, and publishes search-ready views.
  • A hybrid retrieval layer (vector + filters) across Cosmos DB and Azure AI Search to power similarity search and recommendations.

Minimum Qualifications

  • Very strong Python and SQL skills (comfortable analysing complex query plans and working with both PySpark and pandas).
  • 7+ years experience in data engineering/architecture with end-to-end ownership of production SQL databases and pipelines.
  • Deep hands-on experience with Azure data services: ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, Key Vault.
  • Solid background in data modeling (star/snowflake, Data Vault/Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).
  • Proven track record implementing data quality frameworks, lineage, and performance/cost guardrails at scale.
  • Strong understanding of multi-tenant SaaS architectures, security, and privacy (including core GDPR concepts).

Nice to Have

  • Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to embedding pipelines for images/text.
  • Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.
  • Knowledge of SQL Server internals, PolyBase/Serverless SQL; familiarity with Postgres.
  • Experience rolling out Purview, governance frameworks, and data product operating models.

Our Clients Technology Stack

  • Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
  • Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
  • Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
  • Observability: OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)