Job Openings
Senior Site Reliability Engineer
About the job Senior Site Reliability Engineer
You will work closely with development teams to implement SRE best practices, automate operations, and ensure the reliability of our distributed systems. This role combines software engineering with systems engineering to create sustainable, resilient infrastructure solutions that support our rapidly growing platform.
Responsibilities:
- Design, implement, and maintain scalable infrastructure automation using Infrastructure as Code principles
- Lead and implement SLI/SLO frameworks to measure and improve system reliability
- Develop and maintain monitoring, alerting, and observability solutions
- Participate in on-call rotations and lead incident response efforts
- Drive postmortem processes and implement systematic improvements
- Collaborate with development teams to improve system reliability, performance, and efficiency
- Build and maintain tools and frameworks to automate routine operations
- Contribute to capacity planning and performance optimization initiatives
- Champion SRE best practices and mentor junior team members
Qualifications:
- 5+ years of experience in Site Reliability Engineering or similar roles
- Golang is an added advantage
- Cloudflare deep knowledge
- CNCF cert preferred, familiar with Rancher K8s control engine
- Know how to properly use KMS like Hashicorp vault enterprise
- Deep knowledge in ETL
- Strong knowledge of monitoring systems, logging infrastructure, and observability tools
- Solid understanding of networking concepts and distributed systems
- Experience with CI/CD pipelines and automation tools
- Strong problem-solving skills and ability to debug complex production issues
- Experience with high-volume, distributed systems in a production environment
- Knowledge of security best practices and compliance requirements
- Excellent communication skills and ability to collaborate effectively with global teams
Preferred Qualifications:
- Experience with chaos engineering practices
- Contributions to open-source projects
- Experience with performance tuning and optimization