Site Reliability Engineer

Singapore, Singapore

Job Openings Site Reliability Engineer

Cluster Operations & Management

Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units
Ensure optimal performance, scalability, and reliability of distributed systems

Infrastructure Platform Development

Design, build, and enhance infrastructure operation platforms
Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
Drive platform standardization and automation initiatives

High Availability & Reliability

Ensure maximum uptime for production services through proactive monitoring and incident response
Continuously optimize service architecture, deployment strategies, and operational processes
Implement and maintain SLA/SLO frameworks and reliability engineering practices

Automation & Process Improvement

Lead the development of automated operations and maintenance systems
Create self-service tools and workflows to improve team productivity
Establish best practices for infrastructure such as code and configuration management

Required Qualifications

Experience & Education

2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE)
Bachelor's degree in Computer Science, Engineering, or related technical field preferred

Cloud & Infrastructure

Experience with public cloud platforms (AWS, Azure, or GCP) is highly valued
Strong understanding of large-scale internet architecture and distributed systems
Proven experience with infrastructure monitoring, logging, and observability tools

Technical Skills

Proficiency in scripting and automation using Shell, Python, or similar languages
Strong knowledge of containerization technologies (Kubernetes, Docker)
Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
Strong familiarity with common infrastructure components: Nginx, MySQL, Redis, Kafka, Elasticsearch

Advanced Networking (Preferred)

Or refer someone