About the job Data Engineer
Senior Data Platform Engineer
Location: Remote (LATAM, with daily overlap of US business hours)
Team: Engineering / Data Platform
Client: AI-powered property-tax assessment platform for US county governments
Summary
You'll own the data pipelines that turn millions of messy, fragmented external records into clean, reliable, audit-ready data that an AI-powered platform can trust. You'll work directly with the client and a small senior team to ingest new counties, harden multi-tenant data isolation, and make pipeline jobs autonomous and reproducible at scale. This is a high-accuracy domain where a wrong number has legal and revenue consequences, so correctness and observability matter as much as throughput. Because this is a small senior engineering team, this is not an isolated ETL-only role. The ideal candidate should be comfortable tracing data issues across pipelines, PostgreSQL, backend APIs, frontend workflows, product behavior, and user-facing validation.
Requirements (Must-Haves)
Excellent English communication skills — you'll interact directly with the client and US-based stakeholders, defend technical decisions, and explain data issues to non-engineers.
7+ years building and operating production data pipelines, with real experience processing large-scale datasets (millions of records) — not toy projects or tutorials.
Expert Python for data processing, ingestion, and transformation.
Deep SQL and PostgreSQL — complex queries, indexing, query optimization, and debugging performance under real production load.
Hands-on AWS experience for production data workflows, including Glue Jobs, Lambda, Step Functions, EventBridge, RDS/PostgreSQL, object storage, monitoring, retries, failure handling, and cost-aware execution. Experience with adjacent AWS services is expected as the architecture evolves.
Proven ability to design data workflows for quality, correctness, and auditability — including validation, deduplication, anomaly detection, reconciliation against trusted sources, and clear failure visibility before bad data reaches users or downstream systems.
Comfort owning data workflows end-to-end — ingestion, transformation, orchestration, scheduling, observability, monitoring, alerting, failure recovery, and continuous reliability improvement with minimal hand-holding.
Backend/API development experience, preferably with Node.js/TypeScript, able to contribute to backend services around data workflows, expose pipeline outputs through APIs, debug integration issues, and understand how data changes affect product behavior.
Working knowledge of LLM-driven data pipelines (OpenAI, Claude, Gemini, or similar, and on-prem deployed LLMs) and validating/structuring AI-generated output.
Experience with multi-tenant data architectures and tenant data isolation.
Workflow orchestration experience for autonomous, step-controllable jobs.
Experience with observability and operational tooling for data workflows, including structured logs, metrics, alerts, traces, run history, failure visibility, and cost monitoring.
Requirements (Nice-to-Haves)
Data warehousing experience (Snowflake, BigQuery, Redshift, or Databricks), warehouse architectures, dbt, and analytics-engineering / data modeling.
React/frontend experience (Typescript) — strongly preferred because pipeline output directly affects user-facing workflows, QA screens, validation tools, and product behavior.
Infrastructure-as-code (Terraform) and Docker for reproducible local and cloud environments.
Bonus Points
Experience with government, civic, real-estate/proptech, or other regulated high-accuracy data where outputs must be auditable and legally defensible.
Built data-quality or QA tooling that let non-engineers validate results at scale.
Track record of taking over a fragile, fast-moving pipeline and making it boring, reliable, and observable.
Open-source contributions, writing, or speaking on data engineering.