G64 - Full Stack Engineer

Singapore, Singapore, Singapore

Job Openings G64 - Full Stack Engineer

Key Responsibilities

Site Reliability & Operations

Manage and improve the reliability, availability, and operational excellence of the SHIP-HATS platform
Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Lead incident management, troubleshooting, root cause analysis, and post-mortem reviews
Drive continuous improvements to reduce operational toil and prevent recurring incidents
Perform capacity planning, performance tuning, and system optimisation

Observability & Monitoring

Design and implement observability solutions across logging, metrics, and distributed tracing
Build dashboards, alerts, and monitoring strategies to provide deep visibility into platform health
Manage and maintain monitoring stacks such as Prometheus, Grafana, ELK, or equivalent tools

Infrastructure & Automation

Develop and maintain Infrastructure-as-Code (IaC) solutions using tools such as Terraform or Ansible
Automate infrastructure provisioning, deployment, and operational workflows
Support both cloud and on-premises infrastructure environments
Contribute to CI/CD pipeline improvements and platform automation initiatives

Collaboration & Engineering Excellence

Work closely with engineering and product teams to embed reliability and operability practices into the software development lifecycle
Review system architectures and recommend reliability, scalability, and resilience improvements
Advocate for DevSecOps, automation, and operational best practices across teams

Requirements

Degree in Computer Science, Information Technology, Engineering, or related disciplines
Hands-on experience with Kubernetes and container orchestration platforms
Experience with CI/CD and DevSecOps tools such as GitLab, Jira, Confluence, Fortify, or similar platforms
Proficiency in at least one scripting or programming language such as Python, Go, or Bash
Experience with Infrastructure-as-Code tools such as Terraform or Ansible
Familiarity with cloud platforms such as AWS, Azure, or GCP
Experience implementing and managing observability and monitoring solutions such as ELK Stack, Prometheus, or Grafana
Good understanding of networking, system reliability, security hardening, and operational best practices
Strong analytical, troubleshooting, and problem-solving skills
Ability to work effectively in Agile and cross-functional environments
Good communication and stakeholder management skills

Good to Have

Or refer someone