Job Openings Site Reliability Engineer (DevOps, Linux)

About the job Site Reliability Engineer (DevOps, Linux)

Our esteemed client, an established MNC, is searching for a Site Reliability Engineer:

Job Responsibilities

  • Oversee observability, capacity planning, issue analysis, and troubleshooting for large-scale, cloud-native applications in a microservices architecture.
  • Debug and automate routine tasks across operating systems, networks, databases, and application servers, leveraging programming skills beyond basic scripting.
  • Apply DevOps processes and programming knowledge in at least one of the following languages: Java, Python, or Go.
  • Utilize scripting tools such as Shell, Terraform, Ansible, Chef, or Puppet for automation and infrastructure management.
  • Possess deep expertise in Unix/Linux systems, virtual machines, containers, container management systems, enterprise cloud platforms, and data structures.
  • Manage the lifecycle of servicesfrom launch to deployment, operation, and optimization, ensuring reliability and a seamless user experience.
  • Monitor and enhance service reliability by measuring availability, latency, and system health while implementing sustainable incident response strategies.
  • Gather and analyze metrics to optimize performance and troubleshoot priority-level (P0/P1/P2/P3) issues.
  • Contribute to system design recommendations, platform management, and balancing feature development speed with reliability based on service level objectives.
  • Continuously measure and optimize system performance, anticipating and addressing potential user needs while driving innovation and improvements.

Job Requirements:

  • Bachelors degree or higher in Computer Science, Electronics & Communication, or a related field.
  • Minimum 2 years experience in related field.
  • Strong understanding of SRE principles and DevOps processes.
  • Exposure to data-driven decision-making and trend analysis.
  • Experience designing automation frameworks using SaltStack, Spinnaker, or StackStorm.
  • Managing large-scale big data clusters and optimizing data processing efficiency.
  • Knowledge of Chaos Engineering principles for system resilience testing.
  • Expertise in large-scale container management platforms with auto-scaling and intelligent scheduling.
  • Experience in big data analysis, data science, or large-scale data development.
  • Understanding of SIEM (Security Information and Event Management), threat modeling, and vulnerability detection.
  • Hands-on experience in cloud services network design, policy creation, and performance tuning.
  • Proficiency in database consistency checks, slow query optimization, and middleware performance tuning for RDBMS, NoSQL, and distributed caches.

Additional Information:

  • Salary: Up to MYR 9,000 
  • Working Location: Cyberjaya, MY
  • Working Hours: Monday to Friday, 9am - 6pm
  • 1 year renewable contract.

For interested parties, kindly click on "APPLY NOW" or send in your resume in MS Word format to

tstar.recruit.pte.ltd+candidate+jl8vrx7wr@mail.manatal.com

*We regret that only shortlisted candidates will be notified*

TSTAR Recruit Pte Ltd| EA Licence No:22C1039| Co.Reg.No.202207088Z| EA Registration No.: R1767370 (SIA KAI SING)