About the job G03 - DevOps Engineer
We are seeking a Devops Engineer to join our Data and Artificial Intelligence Platforms (DAIP) team.
- DataHive as the next-gen WOG Data Discovery & Request Platform to cater to a wider range of WOG requirements and use cases. DataHive has been further enhanced to support both data requestors and providers as follows. 1) Data requestors/consumers will experience a streamlined data search and request process, with options to access data via curated APIs and through an analytics workbench. 2) Data providers can expect more flexibility and control in the management of datasets on the platform itself.
- Data products that compute and serve derived data for ops and service delivery for multiple agencies. By adopting the Build Once, Use Many Times approach for such data products, the intent is to achieve the benefit of cost-savings, better data quality and improving time-to-market for new services.
- Fraud Detection Platform (FDP). FDP is a central platform for Grant Fraud Insights. It draws data from TC Business & Individual, combines it as a graph and generates insights on businesses and individuals for government grant, procurement and finance due diligence checks and audits.
What the role is
You will work on both small and large scale projects, building and maintaining the infrastructure behind them. The role includes:
- Managing the development, deployment, orchestration and maintenance of data pipelines for our Data Science products
- Providing DevOps architecture implementation and operational support
- Architecture and planning for cloud deployments (Private and Public cloud)
- Developing and managing processes, automation, best practices, documentation
- Development and operation of continuous integration and deployment pipelines.
What it is like working here
We build products that serve a variety of agency users, who use them to solve highly meaningful problems pertinent to our society, from transportation, to education, to healthcare. The public sector is full of opportunities where even the simplest software can have a big impact on peoples lives.
- Rapid Prototyping - Instead of spending too much time debating ideas we prefer testing them. This identifies potential problems quickly, and more importantly, conveys what is possible to others easily.
- Reliable Productization - To scale an idea, a prototype or a Minimum Viable Product to a software product, we scrutinize and commit to its usability, reliability, scalability and maintainability.
- Ownership - In addition to technical responsibilities, this means having ideas on how things should be done and taking responsibility for seeing them through. Building something that you believe in is the best way to build something good.
- Continuous Learning - Working on new ideas often means not fully understanding what you are working on. Taking time to learn new architectures, frameworks, technologies, and even languages is not just encouraged but essential.
As we often deal with big data and computing requirements, you are also able to take a long-term strategic view of the platforms you work on, and help provide this perspective to the team. To do so, you will:
- Effectively prioritize and execute tasks in a high-pressure environment
- Develop and maintain internal engineering productivity tools and environments
- Perform independent research into product and environment issues as required
- Monitoring automation to effectively detect/predict/prevent issues in the environment and code base
- Future-proofing the technical environments and ensuring extreme high levels of automation, availability, scalability and resilience
- Hands-on coding and mentoring, working in highly collaborative teams and building quality environments
- Have knowledge in and/or continuously learn lots of different open source technologies and configurations
What we are looking for
The customers for our products are normally agency users, which means that breadth of knowledge in government IT infrastructure and experience in government networks will help. Since our direction is cloud-first, you will likely have some experience in patch/update scheduling, and knowledge of security incident response procedures. A disciplined approach and strong problem-solving instincts are fundamental to succeed. Your aptitude for completing the tasks and attitude to continuous learning are more valued than any formal certification. To succeed, you will need to possess some of the following:
- Excellent problem solving and methodical troubleshooting skills
- Strong knowledge and experience in DevOps automation, containerization and orchestration using tools eg. Ansible, Airflow, Docker, Kubernetes, Terraform, Artifactory/Nexus Sonatype
- Cloud computing deployment and management experience in AWS, GCP
- Strong scripting skills e.g. Python, Bash, JavaScript, Scala, Rust, Go
- Strong understanding of Apache Spark/Flink, Hadoop, distributed file systems and resource scaling/scheduling, streaming message queues (RabbitMQ, Kafka)
- Strong understanding of virtualization and networking concepts
- Experience with patch maintenance, regression testing and security incident response
- Experience with interactive workloads, machine learning toolkits and how they integrate with cloud computing e.g. Databricks, KX
- Experience with highly scalable distributed systems
- Experience with on-premise deployments, government application and networking infrastructure/routing
- Breadth of knowledge - OS, networking, distributed computing, cloud computing