Job Openings
G64 - Full Stack Engineer
About the job G64 - Full Stack Engineer
Key Responsibilities
Site Reliability & Operations
- Manage and improve the reliability, availability, and operational excellence of the SHIP-HATS platform
- Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Lead incident management, troubleshooting, root cause analysis, and post-mortem reviews
- Drive continuous improvements to reduce operational toil and prevent recurring incidents
- Perform capacity planning, performance tuning, and system optimisation
Observability & Monitoring
- Design and implement observability solutions across logging, metrics, and distributed tracing
- Build dashboards, alerts, and monitoring strategies to provide deep visibility into platform health
- Manage and maintain monitoring stacks such as Prometheus, Grafana, ELK, or equivalent tools
Infrastructure & Automation
- Develop and maintain Infrastructure-as-Code (IaC) solutions using tools such as Terraform or Ansible
- Automate infrastructure provisioning, deployment, and operational workflows
- Support both cloud and on-premises infrastructure environments
- Contribute to CI/CD pipeline improvements and platform automation initiatives
Collaboration & Engineering Excellence
- Work closely with engineering and product teams to embed reliability and operability practices into the software development lifecycle
- Review system architectures and recommend reliability, scalability, and resilience improvements
- Advocate for DevSecOps, automation, and operational best practices across teams
Requirements
- Degree in Computer Science, Information Technology, Engineering, or related disciplines
- Hands-on experience with Kubernetes and container orchestration platforms
- Experience with CI/CD and DevSecOps tools such as GitLab, Jira, Confluence, Fortify, or similar platforms
- Proficiency in at least one scripting or programming language such as Python, Go, or Bash
- Experience with Infrastructure-as-Code tools such as Terraform or Ansible
- Familiarity with cloud platforms such as AWS, Azure, or GCP
- Experience implementing and managing observability and monitoring solutions such as ELK Stack, Prometheus, or Grafana
- Good understanding of networking, system reliability, security hardening, and operational best practices
- Strong analytical, troubleshooting, and problem-solving skills
- Ability to work effectively in Agile and cross-functional environments
- Good communication and stakeholder management skills
Good to Have
- Experience with GitOps workflows and service mesh technologies such as Istio
- Familiarity with secrets management tools such as HashiCorp Vault
- Exposure to government ICT standards, IM8 policies, or regulated environments
- AI-native mindset and software engineering capabilities
- Experience supporting large-scale enterprise or public sector platforms