Job Openings
Site Reliability Engineer (DevOps, Linux)
About the job Site Reliability Engineer (DevOps, Linux)
Our esteemed client, an established MNC, is searching for a Site Reliability Engineer:
Job Responsibilities
- Oversee observability, capacity planning, issue analysis, and troubleshooting for large-scale, cloud-native applications in a microservices architecture.
- Debug and automate routine tasks across operating systems, networks, databases, and application servers, leveraging programming skills beyond basic scripting.
- Apply DevOps processes and programming knowledge in at least one of the following languages: Java, Python, or Go.
- Utilize scripting tools such as Shell, Terraform, Ansible, Chef, or Puppet for automation and infrastructure management.
- Possess deep expertise in Unix/Linux systems, virtual machines, containers, container management systems, enterprise cloud platforms, and data structures.
- Manage the lifecycle of servicesfrom launch to deployment, operation, and optimization, ensuring reliability and a seamless user experience.
- Monitor and enhance service reliability by measuring availability, latency, and system health while implementing sustainable incident response strategies.
- Gather and analyze metrics to optimize performance and troubleshoot priority-level (P0/P1/P2/P3) issues.
- Contribute to system design recommendations, platform management, and balancing feature development speed with reliability based on service level objectives.
- Continuously measure and optimize system performance, anticipating and addressing potential user needs while driving innovation and improvements.
Job Requirements:
- Bachelors degree or higher in Computer Science, Electronics & Communication, or a related field.
- Minimum 2 years experience in related field.
- Strong understanding of SRE principles and DevOps processes.
- Exposure to data-driven decision-making and trend analysis.
- Experience designing automation frameworks using SaltStack, Spinnaker, or StackStorm.
- Managing large-scale big data clusters and optimizing data processing efficiency.
- Knowledge of Chaos Engineering principles for system resilience testing.
- Expertise in large-scale container management platforms with auto-scaling and intelligent scheduling.
- Experience in big data analysis, data science, or large-scale data development.
- Understanding of SIEM (Security Information and Event Management), threat modeling, and vulnerability detection.
- Hands-on experience in cloud services network design, policy creation, and performance tuning.
- Proficiency in database consistency checks, slow query optimization, and middleware performance tuning for RDBMS, NoSQL, and distributed caches.
Additional Information:
- Salary: Up to MYR 9,000
- Working Location: Cyberjaya, MY
- Working Hours: Monday to Friday, 9am - 6pm
- 1 year renewable contract.
For interested parties, kindly click on "APPLY NOW" or send in your resume in MS Word format to
tstar.recruit.pte.ltd+candidate+jl8vrx7wr@mail.manatal.com
*We regret that only shortlisted candidates will be notified*
TSTAR Recruit Pte Ltd| EA Licence No:22C1039| Co.Reg.No.202207088Z| EA Registration No.: R1767370 (SIA KAI SING)