Data Engineer

Columbus, Indiana, United States

Job Openings Data Engineer

About the job Data Engineer

Title: Data Engineer

Location: Columbus, IN.

About the Role
We are hiring a Data Engineer with strong hands-on experience in building
high‑performance data pipelines for a heavy data analytics project. The candidate must be excellent at writing complex aggregations, understanding business processes and analytical requirements, and designing scalable data lake and data warehouse solutions. Experience across multiple data platforms (Databricks, Snowflake, Azure Data Factory, Synapse, etc.) is a strong advantage.

Key Responsibilities
1. Data Pipeline & ETL/ELT Development Develop, optimize, and productionize Spark (PySpark/Scala) pipelines.
Ingest, transform, cleanse, and aggregate large datasets from varied sources.
Implement scalable ETL/ELT logic for batch and near-real-time pipelines.
Apply best practices in partitioning, caching, Delta Lake optimization, and performance tuning.
2. Heavy Data Analytics & Business Understanding Write complex aggregation logic (window functions, rollups, grouping sets, analytical
functions).
Understand business KPIs, metrics, and analytical use cases.
Translate business needs into technical transformations and data models.
Validate data outputs against business logic and analytics expectations.
Collaborate with analysts on calculations: weekly/monthly aggregates, trend lines, performance metrics, dimensional rollups.
Ensure accuracy, consistency, and traceability of business-critical metrics.
3. Data Lake Engineering Build and maintain multi-layer Data Lake architectures (Bronze/Silver/Gold).
Work with Parquet, Delta Lake, ORC, and columnar storage formats.
Implement schema evolution, auditing, and metadata strategies.
4. Data Warehouse Engineering Design dimensional models: Star Schema and Snowflake Schema.
Build fact and dimension tables supporting analytics and reporting.
Optimize table structures, keys, and partitioning strategies.
5. Databricks (Added Advantage) Develop notebooks/jobs using PySpark/Scala.
Manage clusters, workflows, and Delta Live Tables.
Implement best practices for performance and cost efficiency.
6. SQL Engineering Strong command of SQL for aggregations, analytical functions, joins,profiling, andvalidation.
Write and optimize complex queries supporting dashboards, metrics, and reports.
7. Cloud Data Platforms Azure: Data Factory, Synapse Analytics, ADLS Gen2, Azure Functions (optional).
Snowflake: Virtual Warehouses, Snowpipe, Streams & Tasks, performance tuning.
8. Data Quality & Documentation Validate transformation logic against business rules.
Document data flows, transformation rules, aggregation logic, and data
dictionary/metadata.
Work with QA and analysts to ensure outputs match business expectations.

Required Qualifications
5+ years of hands-on data engineering experience.
Strong programming skills: Spark, Scala, Python.
Strong SQL skills (aggregations, analytical functions, large joins).
Experience with Data Lake and Data Warehouse concepts.
Experience with Spark-based processing (delta optimization, shuffle tuning, partitioning).
Experience with at least one cloud data ecosystem (Azure/AWS/GCP).

Preferred Skills
Experience with Databricks (highly desirable).
Experience with Snowflake or modern cloud DWH.
Experience with ADF/Synapse/Airflow/dbt for orchestration.
Knowledge of CI/CD for data pipelines.
Experience with large-scale data analytics environments.

Soft Skills
Strong understanding of business logic behind analytics outputs.
Ability to translate business metrics into technical transformations.
Strong problem-solving and debugging skills.
Good communication and cross-team collaboration.

Or refer someone