Job Openings
Site Reliability Engineer
About the job Site Reliability Engineer
COMPANY PROFILE:
Our client is a Tech Ecommerce Scale-Up that provides a single platform for customers to shop for the best price online. Not only that, they also provide data and insights to customers on latest trends and e-commerce sector.
They are looking for a Site Reliability Engineers (SREs) who are responsible for keeping all services and production systems running smoothly. SREs ensures that services have reliability, uptime appropriate to users' needs and a fast rate of improvement.
You'll have the opportunity to work on complex challenges of scale, using your experience in coding, algorithms, and analysis
RESPONSIBILITY:
- Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
- Collaborate with engineering teams on their infrastructure needs, and advise them throughout the development lifecycle.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless post-mortems.
- Debug production issues across services, databases and levels of the stack.
- Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.
REQUIREMENTS:
- 5-8 years of experience in provisioning environments, deploying applications, and maintaining infrastructures.
- Professional experience using Python, Go, or Ruby.
- Experience with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
- Experience in cloud-based environment such as AWS, GCP or Azure.
- Have extensive experience building scalable platforms leveraging containers in a production environment.
- Added bonus if you have experience in operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
- Solid knowledge of continuous integration, continuous delivery, automated testing and all phases of the software development lifecycle.
- Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
- A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
- Ability to identify problems before they happen and implement solutions that detect and prevent outages.
- Expertise in designing, analysing and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code and automate routine tasks.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.
HOW TO APPLY:
- If you're interested, do click apply on the button provided and attach your CV as well. For further information, feel free to speak to Ariff at +6012-9264666 or email him at ariff.w@aislingsearch.com