Data Engineer 2/3

Bengaluru, KA, India

Job Openings Data Engineer 2/3

About the job Data Engineer 2/3

Games24x7 is Indias leading and most valuable multi-gaming unicorn. Were a full-stack gaming company, offering awesome game playing experiences to over 100 million players through our products - RummyCircle, Indias first and largest online rummy platform, My11Circle, the countrys fastest growing fantasy sports platform, and U Games, a cutting-edge gaming studio making casual games in India for players across the globe.

A pioneer in the online skill gaming industry in India, Games24x7 was founded in 2006 when two New York University trained economists Bhavin Pandya, and Trivikraman Thampy met at the computer lab and discovered their shared passion for online games. Weve always been a technology company at heart, and over the last decade and a half, weve built the organisation on a strong foundation of the science of gaming, leveraging behavioural science, artificial intelligence, and machine learning to provide immersive and hyper-personalised gaming experiences to each of our players.

Backed by marquee investors including Tiger Global Management, The Raine Group, and Malabar Investment Advisors, Games24x7 is leading the charge in Indias gaming revolution, constantly innovating and offering novel entertainment to players!

Our 700 passionate teammates create their magic from our offices in Mumbai, Bengaluru, New Delhi, Miami, and Philadelphia.

*Games24x7 is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, disability status, or any other characteristic protected by the law.*

General Accountabilities/Job Responsibilities

Ensure right stakeholders gets right information at right time
Understand analytical requirement and design data pipelines around it
Develop, test and maintain optimal and scalable end-to-end data pipelines for Batch as well as Real Time data processing.
Leverage open source / AWS/Databricks infrastructure/services for creation and automation of data pipelines
Work with stakeholders including the Executive, Product, Data and Development teams to assist with data-related technical issues and support their data infrastructure needs.
Participate actively in data-marts design discussions
Write code (queries/scripts) in Spark / Hive / Athena, etc that is both functional and elegant, following appropriate design patterns
Build integrations for data ingestion across various types of data stores
Leverage data APIs provided by partner platforms to fetch data on a regular basis
Identify data quality issues and write data cleanup jobs
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metric
Create and maintain documentation of entire data landscape.

Mandatory Requirements:

BE/B.Tech in Computer Science/IT
5 - 7 years experience working in Big Data technologies.
Extensive experience in end-to-end development and maintenance of batch data pipeline and near real time Streaming data pipeline with single digit second of latency.
Hands-on experience in Hadoop, Spark, Presto, Hive, Sqoop,MapReduce., etc
Hands-on experience of Java and Python Scripting.
Hands-on experience in SparkSQL, HiveQL and SQL.
Experience with integration of data from multiple data sources (Kafka, MongoDB, Mysql, Cassandra, 3rd Party APIs etc)
Hands-on experience on messaging queues like Kafka and Kinesis.
Working experience on KsqlDB, Druid, Aerospike,Flink etc.
Knowledge of Linux and shell scripting
Should have knowledge of AWS services such as S3, EC2, EMR, Athena, Glue, Redshift etc.
Deep understanding of the Hadoop ecosystem and strong conceptual knowledge in Hadoop architecture components
Strong experience with Data warehousing and Data modeling
Capable of processing large sets of structured, semi-structured and unstructured data.
Hands-on experience in Sqoop/Spark for importing data from RDBMS to HDFS and vice-versa.
Understand the business requirements and build the Big Data Lake based on Big Data technologies.
Experience in end-to-end design and build process of Real Time Pipelines(Sub Second latency) will be preferred.
Developing the data lake for real time data streaming from multiple data sources.
Working on and leading Proof of Concept (PoC) projects in the big data space.
Designing data pipelines to process any size and any file format.

Or refer someone