Job Openings Site Reliability Engineers (SREs)

About the job Site Reliability Engineers (SREs)

Hiring Position: Site Reliability Engineers (SREs): Open to All Nationalities

Working Condition: 100% On-Site: BTS Accessible

Location: Bangkok, Thailand

Pay Rate: THB 75000 TO THB 100000

__________________________________________________________________

About the Role

Our client is looking for a Site Reliability Engineer (SRE) to improve the reliability, performance, and efficiency of their software and IT services. This role acts as a bridge between development and operations, ensuring seamless deployment, automation, and monitoring of critical systems.

Key Responsibilities

  • Monitor and Improve System Performance Track system health, identify bottlenecks, and enhance service availability.
  • Automate Processes Reduce manual work by writing scripts and implementing automation tools.
  • Service Reliability & Risk Mitigation Define key performance metrics (SLIs, SLOs) and manage error budgets to reduce risks.
  • Incident Management Take ownership of platform-related incidents, ensure quick resolution, and enhance long-term stability.
  • Collaboration with Development Teams Work closely with engineers to streamline deployments and improve system efficiency.

Qualifications

- Junior Level: 3 to 5 years of experience as a Software Engineer or System Administrator, with a strong interest in becoming an SRE.

- Senior Level: Minimum 5 years of experience as an SRE.

- Proficiency in at least one coding language (Bash, Python, PowerShell, etc.).
- Experience with monitoring tools like Datadog, Grafana, ElasticSearch, or Kibana.
- Familiarity with cloud services such as AWS.
- Strong communication skills in English (spoken & written).

Nice to Have (Bonus Skills)

- Knowledge of IT operations and best practices for high-availability systems.
- Experience with CI/CD automation tools (GitHub Actions, Jenkins, Ansible, Terraform, etc.).
- Understanding of containerization (Docker, Kubernetes, Helm).
- Familiarity with IT service management (incident, problem, and change management).