Site Reliability Engineer

Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Job Openings Site Reliability Engineer

About the job Site Reliability Engineer

COMPANY PROFILE:

Our client is a Tech Ecommerce Scale-Up that provides a single platform for customers to shop for the best price online. Not only that, they also provide data and insights to customers on latest trends and e-commerce sector.

They are looking for a Site Reliability Engineers (SREs) who are responsible for keeping all services and production systems running smoothly. SREs ensures that services have reliability, uptime appropriate to users' needs and a fast rate of improvement.

You'll have the opportunity to work on complex challenges of scale, using your experience in coding, algorithms, and analysis

RESPONSIBILITY:

Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
Collaborate with engineering teams on their infrastructure needs, and advise them throughout the development lifecycle.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless post-mortems.
Debug production issues across services, databases and levels of the stack.
Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.

REQUIREMENTS:

5-8 years of experience in provisioning environments, deploying applications, and maintaining infrastructures.
Professional experience using Python, Go, or Ruby.
Experience with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
Experience in cloud-based environment such as AWS, GCP or Azure.
Have extensive experience building scalable platforms leveraging containers in a production environment.
Added bonus if you have experience in operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
Solid knowledge of continuous integration, continuous delivery, automated testing and all phases of the software development lifecycle.
Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
Ability to identify problems before they happen and implement solutions that detect and prevent outages.
Expertise in designing, analysing and troubleshooting large-scale distributed systems.
Ability to debug, optimize code and automate routine tasks.
Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.

HOW TO APPLY:

If you're interested, do click apply on the button provided and attach your CV as well. For further information, feel free to speak to Ariff at +6012-9264666 or email him at ariff.w@aislingsearch.com

Or refer someone