About the job Fullstack/Nodejs
Senior Data Engineer
Description:
TrueData Solutions is a leader in independent identity resolution data services, connecting people and households to their digital devices across the globe. As a trusted partner of the largest enterprise data companies, TrueData continues to build privacy-centric solutions that help clients achieve their addressability goals.
As a Senior Data Engineer, you will be responsible for designing, building, and maintaining robust data pipelines and storage solutions on AWS. Your expertise in Apache Spark, Apache Cassandra, RocksDB, and machine learning will be crucial in processing and analyzing large datasets to support our data-driven decision-making. You will work closely with our AI/ML team to implement and optimize machine learning models, leveraging techniques such as regression, classification, clustering, and deep learning.
We are a remote-first and in-person-enabled small tech company, looking to build our growing presence in Mexico City with this role. This role will report to the VP of Engineering.
Responsibilities:
Collaborate with product managers, designers, and other stakeholders to understand project requirements and translate them into technical specifications.
Design, develop, and maintain scalable data pipelines using Apache Spark.
Collaborate with data analysts and data engineers to integrate ML models into data pipelines, utilizing techniques such as regression, classification, clustering, and deep learning.
Ensure data quality, integrity, and security across all data systems.
Develop and maintain ETL processes to ingest, process, and store data from various sources.
Monitor and troubleshoot data pipelines and storage systems to ensure high availability and performance.
Work with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
Stay up-to-date with the latest industry trends and technologies in data engineering, big data, and machine learning.
Requirements:
Essential:
Bachelors or Masters degree in Computer Science, Engineering, or a related field.
5+ years of experience in data engineering or a related role.
Strong proficiency in Apache Spark for data processing and analysis.
Extensive experience with NoSQL databases, particularly Apache Cassandra and RocksDB.
Solid understanding of machine learning concepts and experience integrating ML models into data pipelines.
Proficiency in programming languages such as Scala, Python, or Java.
Experience with data pipeline orchestration tools (e.g., Apache Airflow, Luigi).
Strong problem-solving skills and the ability to troubleshoot complex data issues.
Excellent communication skills and the ability to work collaboratively in a team environment.
Preferred:
AWS Developer Certificate, such as AWS Certified Developer - Associate.
Proficiency in version control systems (such as Git) and agile development methodologies.
Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes).
Familiarity with CI/CD pipelines and DevOps practices.
Experience with real-time data processing frameworks (e.g., Apache Kafka, Flink).
Benefits:
Competitive Salary
Health Care Stipend
Work From Home Stipend
Paid Time Off and Vacation Bonus
Other standard CDMX employee benefits
About TrueData:
Founded in 2013, TrueData is a rapidly growing company with employees concentrated in Los Angeles, Chicago, New York, and Mexico City. TrueData empowers growth champions by delivering data solutions that are accessible, actionable, and true.
TrueData believes that high-quality data will power the AI and algorithms of the future and is at the forefront of this trend. TrueData delivers real, people-based data that is never inferred, modeled, or extrapolated. It is TrueDatas unwavering dedication to deliver real, quality data at scale.
TrueData works with the broadest set of data types on the market, providing brands, marketers, AdTech vendors, and publishers a single platform for an array of data services from identity resolution and audience segment activation to flexibility and control of data licensing while remaining privacy safe.