Job Openings Data Engineer

About the job Data Engineer

Position Overview

We are looking for an experienced Data Engineer with deep hands‑on expertise in Cloudera (CDH/CDP) and Apache NiFi to design, build, and operate scalable data pipelines within a regulated banking environment. The ideal candidate understands distributed data processing, real‑time ingestion patterns, and secure data handling consistent with banking compliance standards.

The role will collaborate closely with Data Architecture, Platform Engineering, Application Teams, and Information Security to ensure reliable, high‑performance data movement across the enterprise.

Key Responsibilities

  • Design, build, and maintain end‑to‑end data ingestion and transformation pipelines using: Apache NiFi for flow orchestration, Spark (batch/streaming) for transformation, Hive/Impala for processing and analytics.
  • Implement real-time and batch ingestion patterns using Kafka, NiFi, Sqoop, APIs, and file-based ingestion.
  • Develop reusable ingestion templates, processors, and flow patterns for consistent onboarding.
  • Work within Cloudera CDH/CDP environments, leveraging services such as HDFS, YARN, Hive, Impala, Ranger, Kafka, Oozie, and HBase.
  • Integrate pipelines with data governance tools, metadata catalogues, and secure storage zones.
  • Collaborate with Cloudera administrators to optimize service-level performance for data workloads.
  • Implement data quality checks, schema validation, and reconciliation logic.
  • Optimize Spark jobs, SQL queries, and NiFi flows for performance and cost efficiency.
  • Troubleshoot bottlenecks across compute, storage, and network layers.
  • Ensure adherence to banking regulatory standards (e.g., MAS TRM, GDPR, PCI-DSS, internal audit policies).
  • Implement strong security patterns: Kerberos, Ranger policies, encryption, tokenization, masking.
  • Maintain data lineage, audit systems, and access control logs for governance and compliance.
  • Develop monitoring dashboards for NiFi pipelines and Cloudera services.
  • Write automation scripts using Python, Shell, or Ansible.
  • Support release cycles, CAB approvals, deployments, and production support during incidents.

Qualifications

  • Bachelors/Masters in Computer Science, Engineering, or related discipline.
  • 3–8 years of experience as a Data Engineer in large-scale distributed data environments.
  • Strong hands-on experience with:
    • Apache NiFi (flow design, processors, controllers, templates, versioning)
    • Cloudera CDH/CDP ecosystem
    • Spark (PySpark or Scala), Hive, Impala
    • Kafka (producers, consumers, schemas)
  • Experience with HDFS, distributed computing, and data modelling concepts.
  • Strong programming skills: Python, Scala, SQL, Shell scripting.

Preferred Banking Domain Experience

  • Data engineering experience supporting:
    • Payments
    • Customer 360 platforms
    • Regulatory reporting
  • Familiarity with:
    • Banking-grade network segmentation (DMZ, SDZ, HDZ)
    • Secure API and file transfer patterns
    • Strict SDLC processes (CAB approvals, ITIL operations)

Soft Skills

  • Strong problem-solving skills with the ability to troubleshoot end‑to‑end pipeline issues.
  • Ability to work in cross-functional teams and communicate technical concepts clearly.
  • Detail-oriented, structured, and comfortable working under regulatory pressure.

Certifications (Nice to Have)

  • Cloudera Data Platform Data Engineer Certification.
  • CCA Spark and Hadoop Developer (CCP/CCDH).
  • NiFi or data integration certifications.
  • Cloud certifications (AWS/GCP/Azure Big Data).