About the job Senior SRE/DevOps Engineer
Senior SRE Engineer
About the job
We are looking for an experienced Senior SRE Engineer to join our team.
The ideal candidate will be responsible for Improving reliability and performance,
partnering with teams, sharing knowledge, and building a learning culture around
incidents, designing tools, contribute to documentation, and uplifting partner teams'
capabilities.
Main responsibilities:
Evolve systems by pushing for changes that improve reliability and latency.
Our day-to- day is driven by helping our product teams create robust software faster.
Introduce best practices into the teams around observability, SLOs and reliability.
Work in close collaboration with partner teams to shape the future roadmap to improve
reliability and establish strong operational readiness across teams.
Participate in system design consulting, and capacity planning.
Identify areas for improvement across the organization and drive Engineering -wide
technical change in the field of Site Reliability.
Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate
tech and engineering best practices.
Partner with Nana to build a culture of rigorously learning from incidents.
Contribute to Root Cause Analysis (RCA) investigations and foll ow up each incident to
ensure the appropriate action items are in place and prioritized.
Designing tools to help our entire engineering organization be as productive as possible.
Contribute to documentation and uplifting of partner teams
Qualifications required:
Bachelors degree or equivalent practical experience.
5 years hands-on experience in Site Reliability and Observability Engineering, debugging,
diagnosing and correcting errors and resolving high severity incidents
Commercial experience i n one of the following languages Python or Go.
Think about systems - edge cases, failure modes, behaviors, specific implementations.
Experience building solutions in distributed systems for high volume transactions and/or
developing support focused tooling.
Solid experience with cloud infrastructure and tooling (GCP, Kubernetes, CI/CD pipelines
, Terrafrom and Pulumi).
Experience working with microservices, and microserivces resilience domain
Experience working on various monitoring, and alerting tools