Job Description:
About ShipIn:
At ShipIn Systems, we are driving operations for the leaders in the maritime industry through our Visual Fleet Management Platform. With patented computer vision applications and real-time visual analytics, ShipIn’s platform proactively alerts shipowners, managers, and seafarers to activity onboard to improve safety and drive more efficient operations to modernize the global supply chain.
Position Description:
We are looking for a skilled Data Engineer to join our Data Pipeline team within the R&D department. In this role, you will be responsible for developing and optimizing machine learning (ML) systems integrated into our large-scale data pipelines. You will work closely with data scientists, backend engineers, and other stakeholders to ensure the seamless integration of ML models into production. Your expertise in distributed systems and real-time data processing will be critical to advancing our ML capabilities.
Key Responsibilities:
- Design, implement, and optimize machine learning workflows within distributed data pipelines using Apache Kafka, Kafka Streams, Ray, Spark, and other technologies.
Collaborate with data scientists to deploy and scale ML models, ensuring they operate efficiently in real-time environments.
Develop high-performance data processing systems that handle both batch and stream processing.
Implement and optimize data storage and retrieval mechanisms using ElasticSearch, S3 and PostgreSQL to support Analytics jobs and ML model training and inference.
Write clean, efficient code using Scala, Rust, and Python to build and maintain ML infrastructure.
Ensure the scalability and reliability of ML systems, employing techniques such as fault tolerance, replication, and load balancing.
Collaborate with cross-functional teams to integrate ML pipelines with product features and improve overall system performance.
Participate in code reviews and provide constructive feedback, maintaining high-quality standards across the team.
Stay current with the latest developments in machine learning, data engineering, and distributed systems, applying new insights to improve existing processes.
Troubleshoot and resolve issues related to ML pipelines, ensuring minimal downtime and optimal performance.
Qualifications / Experiences:
- 3+ years of experience working with machine learning systems and data pipelines.
Strong hands-on experience with distributed data processing technologies such as Apache Kafka, Spark, and Ray.
Proficiency in one or more of the following programming languages: Python, Scala, Rust.
Solid understanding of machine learning workflows and how to integrate models into large-scale production environments.
Experience with ElasticSearch and PostgreSQL for managing data storage and querying in ML systems.
Familiarity with cloud-based infrastructure and containerization technologies such as Docker and Kubernetes.
Strong problem-solving skills and ability to troubleshoot complex issues in distributed systems.
Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.