Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal

Portugal, Portugal

Job Openings Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal

About the job Data Engineer (Apache Airflow/Spark/AWS) - Full Remote Portugal

ABOUT THE OPPORTUNITY

Join a mission-driven health technology company operating at the intersection of real-world data and patient outcomes. This organization is building a global evidence network that enables healthcare organizations to access harmonized, privacy-preserving datasets — directly impacting how treatments are evaluated and improved for patients worldwide.

This is a fully remote, senior-level consulting engagement where data pipelines are not just infrastructure — they are the product. You'll work in a fast-paced, engineering-led environment alongside talented people who care deeply about the quality and impact of their work.

Language requirement: English B2 (Upper Intermediate) minimum — daily collaboration with international teams is expected.

PROJECT & CONTEXT

You will be part of a core data engineering team responsible for designing and operating the pipelines that ingest, transform, and standardize real-world health data. The work involves translating complex source data structures into recognized healthcare data standards (FHIR, OMOP-CDM), ensuring data quality, security, and scalability across the platform.

The tech stack includes: Apache Airflow, Apache Spark, AWS Redshift, Trino, ClickHouse, Dagster, and Airbyte, with heavy use of SQL and Python.

WHAT WE'RE LOOKING FOR (Required)

Advanced SQL skills — complex transformations, performance optimization, data modeling
Strong Python proficiency for pipeline development and automation
Proven senior-level track record as a Data Engineer, with experience scaling data products from pilot to production
Hands-on experience building and optimizing big data pipelines, architectures, and datasets using tools such as Apache Airflow and/or Apache Spark
Experience with AWS technologies (Redshift, S3, or equivalent managed services)
Strong analytical skills when working with unstructured and semi-structured datasets
Solid understanding of data science and Big Data concepts
Excellent written and verbal communication skills in English (B2+)
Self-driven, ownership mentality, proactive approach to problem-solving

NICE TO HAVE (Preferred)

Experience with health data frameworks: OMOP-CDM and/or FHIR
Familiarity with additional stack components: Dagster, Trino, ClickHouse, Airbyte
Experience with data quality assurance, writing test specifications and test case scenarios
Background working in healthcare or life sciences data environments
Experience in agile/continuous delivery engineering teams

Or refer someone