About the job Data Engineer
Senior Data Engineer: Data Warehouse & Pipeline Optimization
About Mindtech
Mindtech is your gateway to exciting and impactful tech projects. We specialize in end-to-end software outsourcing, linking Latin American talent with global opportunities. Our fast, cost-effective approach ensures that our clients receive exceptional service and innovative solutions. With a diverse team of over 80 skilled professionals across Latin America and the US, we are committed to delivering software that drives success.
Position Overview
We are seeking a highly experienced and specialized Senior Data Engineer to manage and optimize our critical data infrastructure. This role requires immediate expertise in high-volume ETL/ELT pipeline maintenance, complex data warehousing on Snowflake, and advanced data modeling using SQL. The successful candidate must operate at a senior design level (8+ years experience) within a regulated domain, leveraging the Microsoft Azure cloud ecosystem and possessing a specialized focus on optimizing orchestration systems.
Operational Requirement (Schedule)
The Senior Data Engineer must ensure availability during the Enterprise Data Warehouse (EDW) batch job runtime, which occurs daily from 12:30 AM to 3:30 AM EST. This is necessary to quickly address any issues or defects (such as pipeline failures or data inconsistencies) and rerun failed jobs before US users begin their workday.In addition to this critical early shift, the candidate is expected to be available until at least 1:00 PM EST (totaling approximately 8 hours/day) to support overlap with US business hours and stakeholder collaboration. We value outcomes over micromanagement.
Key Responsibilities
Pipeline Optimization and Orchestration
- Optimize and maintain complex ETL/ELT pipelines developed using PySpark and Python to process high volumes of financial or insurance data.
- Specialize in optimizing the code and architecture of existing data transformation workflows, addressing performance bottlenecks, and migrating legacy processes to PySpark-based solutions to enhance scalability and execution speed.
- Develop, maintain, and ensure the reliability of automated data pipelines using Apache Airflow for orchestration and scheduling.
- Implement complex scheduling for critical tasks using tools like Cron Jobs.
- Provide robust error debugging and troubleshooting support for complex pipeline failures and resolving data inconsistencies in production environments.Advanced Data Warehousing and Modeling
- Demonstrate strong knowledge in data warehousing practices, including designing and maintaining complex logical models such as star and snowflake schemas for EDW layers.
- Develop and maintain data solutions leveraging Snowflake for high-performance warehousing, including leveraging computational APIs built with Snowpark.
- Utilize Advanced SQL for data transformation and loading, applying specialized load strategies developed by the incumbent, including SCD Type 1, Type 2, and Type 3, incremental, and truncate-and-load methods.
- Implement and manage data manipulation and transformation logic using advanced SQL Server stored procedures.
-Utilize PySpark on Azure Databricks to transform and enrich structured and semi-structured data, ensuring high-quality output for downstream analytics.
Data Integrity and Domain Expertise
- Implement robust data validation frameworks using Python and PySpark to maintain data integrity, consistency, and accuracy across business functions.
- Ensure compliance and data security in cloud environments by applying appropriate encryption and access controls within Azure services.
- Apply knowledge of data processing requirements within a highly regulated industry, specifically Financial Services (e.g., Stock Market Analytics) or Insurance (e.g., Data Migration for Pacific Life or High Net Worth Insurance).
- Utilize Python libraries, including Pandas, for complex data analysis and manipulation.
Required Qualifications
- 8+ years of experience in data engineering and data warehouse environments.
- 3+ years working with Azure cloud platforms.
- Proven ability to design and implement complex data models (star/snowflake schemas).
- Mandatory experience in the Insurance industry or the Financial Services.
- Bachelors or Masters degree in Computer Science, Engineering, or a related field
We offer:
- 100% remote work
- Salary in USD
- Referral Program