Job Openings
Data Engineer
About the job Data Engineer
We are looking for a Data Engineer who has working knowledge of building and maintaining scalable data pipelines on-premises and on the cloud. This includes understanding the input and output data sources, upstream downstream dependencies and ensuring data quality. A key aspect of this role will be focusing on the deprecation of migrated workflows and migration of workflows into new systems (if needed). The ideal candidate should be experienced with tools and technologies such as Git, Apache Airflow, Apache Spark, SQL, data migration, and data validation. Key Responsibilities: 1. Workflow Deprecation o Plan and execute the deprecation of migrated workflows by evaluating current workflows&39; dependencies and consumption. o Utilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders. 2. Data Migration o Plan and execute data migration tasks to move data between different storage systems or formats. o Ensure the accuracy and completeness of data during migration processes. o Implement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use. 3. Data Validation o Define and implement data validation rules to ensure data accuracy, completeness, and reliability. o Utilize data validation solutions and anomaly detection methods to monitor data quality. 4. Workflow Management o Use Apache Airflow to schedule, monitor, and automate data workflows. o Develop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks. 5. Data Processing o Develop and maintain data processing scripts using SQL and Apache Spark. o Optimize data processing for performance and efficiency. 6. Version Control o Use Git for version control, collaborating with the team to manage the codebase and track changes. o Ensure best practices in code quality and repository management. 7. Continuous Improvement o Keep up to date with the latest developments in data engineering and related technologies. o Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability. Skills and Qualifications: · Bachelor&39;s degree in Computer Science, Engineering, or a related field. · Proficient in Git for version control and collaborative development. · Proficiency in SQL and experience with database technologies. · Experience in data pipeline tools such as Apache Airflow. · Strong knowledge of Apache Spark for data processing and transformation. · Experience with data migration and validation techniques. · Knowledge of data governance and security practices. · Strong problem-solving skills and the ability to work independently and in a team. · Ability to communicate with global team · Ability to work as a team in high performing environment.