About the job Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal
ABOUT THE OPPORTUNITY
Join a mission-driven health technology company operating at the intersection of real-world data and patient outcomes. This organization is building a global evidence network that enables healthcare organizations to access harmonized, privacy-preserving datasets — directly impacting how treatments are evaluated and improved for patients worldwide.
This is a fully remote, senior-level consulting engagement where data pipelines are not just infrastructure — they are the product. You'll work in a fast-paced, engineering-led environment alongside talented people who care deeply about the quality and impact of their work.
Language requirement: English B2 (Upper Intermediate) minimum — daily collaboration with international teams is expected.
PROJECT & CONTEXT
You will be part of a core data engineering team responsible for designing and operating the pipelines that ingest, transform, and standardize real-world health data. The work involves translating complex source data structures into recognized healthcare data standards (FHIR, OMOP-CDM), ensuring data quality, security, and scalability across the platform.
The tech stack includes: Apache Airflow, Apache Spark, AWS Redshift, Trino, ClickHouse, Dagster, and Airbyte, with heavy use of SQL and Python.
WHAT WE'RE LOOKING FOR (Required)
- Advanced SQL skills — complex transformations, performance optimization, data modeling
- Strong Python proficiency for pipeline development and automation
- Proven senior-level track record as a Data Engineer, with experience scaling data products from pilot to production
- Hands-on experience building and optimizing big data pipelines, architectures, and datasets using tools such as Apache Airflow and/or Apache Spark
- Experience with AWS technologies (Redshift, S3, or equivalent managed services)
- Strong analytical skills when working with unstructured and semi-structured datasets
- Solid understanding of data science and Big Data concepts
- Excellent written and verbal communication skills in English (B2+)
- Self-driven, ownership mentality, proactive approach to problem-solving
NICE TO HAVE (Preferred)
- Experience with health data frameworks: OMOP-CDM and/or FHIR
- Familiarity with additional stack components: Dagster, Trino, ClickHouse, Airbyte
- Experience with data quality assurance, writing test specifications and test case scenarios
- Background working in healthcare or life sciences data environments
- Experience in agile/continuous delivery engineering teams