About the job Data Engineer
Job Description:
As a Data Engineer, you will be responsible for designing, implementing, and maintaining scalable data pipelines and systems to support data processing, analysis, and visualization. You will collaborate closely with cross-functional teams to understand data requirements, optimize data workflows, and ensure data quality and reliability.
Responsibilities:
1. Data Pipeline Development: Design, build, and optimize data pipelines to extract, transform, and load (ETL) data from various sources into data warehouses or data lakes, ensuring scalability, reliability, and efficiency.
2. Data Modeling: Develop and maintain data models and schemas to support business analytics, reporting, and machine learning applications, ensuring data integrity, consistency, and performance.
3. Data Integration: Integrate data from disparate sources, including databases, APIs, streaming platforms, and third-party systems, to provide unified views of data for analysis and reporting purposes.
4. Data Quality Assurance: Implement data quality checks and monitoring processes to identify and rectify data anomalies, inconsistencies, and errors, ensuring high data quality and reliability.
5. Performance Optimization: Optimize data processing and query performance through indexing, partitioning, and caching techniques, leveraging distributed computing frameworks and database technologies.
6. Infrastructure Management: Manage and maintain data infrastructure components, including databases, data warehouses, ETL pipelines, and data processing clusters, ensuring availability, scalability, and security.
7. Collaboration and Documentation: Collaborate with data analysts, data scientists, and other stakeholders to understand data requirements and deliver solutions that meet business needs. Document data pipelines, workflows, and system configurations for knowledge sharing and future reference.
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field. Master's degree preferred.
- Proven experience 4 years as a Data Engineer or similar role, with a strong track record of designing and implementing data pipelines and systems.
- Proficiency in programming languages such as Python, Java, or Scala, and experience with data processing frameworks such as Apache Spark, Apache Flink, or Hadoop.
- Solid understanding of database technologies, including SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
- Familiarity with cloud platforms such as AWS, Azure, or GCP, and experience with cloud-based data services (e.g., AWS Redshift, Azure SQL Data Warehouse, Google BigQuery).
- Strong analytical and problem-solving skills, with the ability to troubleshoot complex data issues and optimize data workflows for performance and scalability.
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders.