Job Openings Big Data & Cloud Data Engineer

About the job Big Data & Cloud Data Engineer

Big Data & Cloud Data Engineer

Position Overview

We are seeking a Big Data & Cloud Data Engineer to design, implement, and manage large-scale data processing systems using big data technologies (Hadoop, Spark, Kafka) and cloud-based data ecosystems (Azure, GCP, AWS), enabling advanced analytics and real-time data processing capabilities across our enterprise.

Key Responsibilities

Big Data Platform Development

  • Design and implement Hadoop ecosystems including HDFS, YARN, and distributed computing frameworks

  • Develop real-time and batch processing applications using Apache Spark (Scala, Python, Java)

  • Configure Apache Kafka for event streaming, data ingestion, and real-time data pipelines

  • Implement data processing workflows using Apache Airflow, Oozie, and workflow orchestration tools

  • Build NoSQL database solutions using HBase, Cassandra, and MongoDB for high-volume data storage

Cloud Data Architecture

  • Design multi-cloud data architectures using Azure Data Factory, AWS Glue, and Google Cloud Dataflow

  • Implement data lakes and lakehouses using Azure Data Lake, AWS S3, and Google Cloud Storage

  • Configure cloud-native data warehouses including Snowflake, BigQuery, and Azure Synapse Analytics

  • Build serverless data processing solutions using AWS Lambda, Azure Functions, and Google Cloud Functions

  • Implement containerized data applications using Docker, Kubernetes, and cloud container services

Data Pipeline Engineering

  • Develop ETL/ELT pipelines for structured and unstructured data processing

  • Create real-time streaming analytics using Kafka Streams, Apache Storm, and cloud streaming services

  • Implement data quality frameworks, monitoring, and alerting for production data pipelines

  • Build automated data ingestion from various sources including APIs, databases, and file systems

  • Design data partitioning, compression, and optimization strategies for performance

Platform Administration & Optimization

  • Manage cluster provisioning, scaling, and resource optimization across big data platforms

  • Monitor system performance, troubleshoot issues, and implement capacity planning strategies

  • Configure security frameworks including Kerberos, Ranger, and cloud IAM services

  • Implement backup, disaster recovery, and high availability solutions

  • Optimize query performance and implement data governance policies

Required Qualifications

Technical Skills

  • 5+ years experience with big data technologies (Hadoop, Spark, Kafka, Hive, HBase)

  • Strong programming skills in Python, Scala, Java, and SQL for data processing

  • Expert knowledge of at least one major cloud platform (Azure, AWS, GCP) and data services

  • Experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)

  • Proficiency in stream processing frameworks and real-time analytics architectures

  • Knowledge of data modeling, schema design, and database optimization techniques

Data Engineering Skills

  • Experience with data pipeline orchestration and workflow management tools

  • Strong understanding of distributed systems, parallel processing, and scalability patterns

  • Knowledge of data formats (Parquet, Avro, ORC) and serialization frameworks

  • Experience with version control, CI/CD pipelines, and DevOps practices for data platforms

Preferred Qualifications

  • Bachelor's degree in Computer Science, Data Engineering, or related field

  • Cloud certifications (Azure Data Engineer, AWS Data Analytics, Google Cloud Data Engineer)

  • Experience with machine learning platforms and MLOps frameworks

  • Background in data governance, data cataloging, and metadata management

  • Knowledge of emerging technologies (Delta Lake, Apache Iceberg, dbt)