Job Openings Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal

About the job Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal

ABOUT THE OPPORTUNITY

Join a mission-driven health technology company operating at the intersection of real-world data and patient outcomes. This organization is building a global evidence network that enables healthcare organizations to access harmonized, privacy-preserving datasets — directly impacting how treatments are evaluated and improved for patients worldwide.

This is a fully remote, senior-level consulting engagement where data pipelines are not just infrastructure — they are the product. You'll work in a fast-paced, engineering-led environment alongside talented people who care deeply about the quality and impact of their work.

Language requirement: English B2 (Upper Intermediate) minimum — daily collaboration with international teams is expected.

PROJECT & CONTEXT

You will be part of a core data engineering team responsible for designing and operating the pipelines that ingest, transform, and standardize real-world health data. The work involves translating complex source data structures into recognized healthcare data standards (FHIR, OMOP-CDM), ensuring data quality, security, and scalability across the platform.

The tech stack includes: Apache Airflow, Apache Spark, AWS Redshift, Trino, ClickHouse, Dagster, and Airbyte, with heavy use of SQL and Python.

WHAT WE'RE LOOKING FOR (Required)

  • Advanced SQL skills — complex transformations, performance optimization, data modeling
  • Strong Python proficiency for pipeline development and automation
  • Proven senior-level track record as a Data Engineer, with experience scaling data products from pilot to production
  • Hands-on experience building and optimizing big data pipelines, architectures, and datasets using tools such as Apache Airflow and/or Apache Spark
  • Experience with AWS technologies (Redshift, S3, or equivalent managed services)
  • Strong analytical skills when working with unstructured and semi-structured datasets
  • Solid understanding of data science and Big Data concepts
  • Excellent written and verbal communication skills in English (B2+)
  • Self-driven, ownership mentality, proactive approach to problem-solving

NICE TO HAVE (Preferred)

  • Experience with health data frameworks: OMOP-CDM and/or FHIR
  • Familiarity with additional stack components: Dagster, Trino, ClickHouse, Airbyte
  • Experience with data quality assurance, writing test specifications and test case scenarios
  • Background working in healthcare or life sciences data environments
  • Experience in agile/continuous delivery engineering teams