Job Openings
Data Engineer - (PySpark+Hadoop)
About the job Data Engineer - (PySpark+Hadoop)
Key Responsibilities:
- Create Spark Scala/PySpark jobs for data transformation and aggregation.
- Produce unit tests for Spark transformations and helper methods.
- Use Spark and Spark-SQL to read parquet data and create tables in Hive using the Scala API.
- Work closely with Business Analysts to review test results and obtain sign-off.
- Prepare necessary design and operations documentation for future use.
- Perform peer code quality reviews and ensure compliance with quality standards.
- Engage in hands-on coding, often in a pair programming environment.
- Collaborate with teams to build quality code and ensure smooth production deployments.
Requirements:
- 4-10 years of experience as a Hadoop Data Engineer, with strong expertise in Hadoop, Spark, Scala, PySpark, Python, Hive, Impala, CI/CD, Git, Jenkins, Agile Methodologies, DevOps, and Cloudera Distribution.
- Strong knowledge of data warehousing methodologies.
- Minimum of 4 years of relevant experience in Hadoop and Spark/PySpark.
- Strong understanding of enterprise data architectures and data models.
- Experience in the core banking and finance domains.
- Familiarity with Oracle, Spark streaming, Kafka, and machine learning.
- Good to have cloud experience, particularly with AWS.
- Ability to develop applications using the Hadoop tech stack efficiently and effectively, ensuring on-time, in-specification, and cost-effective delivery.