Senior SRE/DevOps Engineer

Cairo, Egypt

Job Openings Senior SRE/DevOps Engineer

Senior SRE Engineer

About the job

We are looking for an experienced Senior SRE Engineer to join our team.

The ideal candidate will be responsible for Improving reliability and performance,

partnering with teams, sharing knowledge, and building a learning culture around

incidents, designing tools, contribute to documentation, and uplifting partner teams'

capabilities.

Main responsibilities:

Evolve systems by pushing for changes that improve reliability and latency.

Our day-to- day is driven by helping our product teams create robust software faster.

Introduce best practices into the teams around observability, SLOs and reliability.

Work in close collaboration with partner teams to shape the future roadmap to improve

reliability and establish strong operational readiness across teams.

Participate in system design consulting, and capacity planning.

Identify areas for improvement across the organization and drive Engineering -wide

technical change in the field of Site Reliability.

Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate

tech and engineering best practices.

Partner with Nana to build a culture of rigorously learning from incidents.

Contribute to Root Cause Analysis (RCA) investigations and foll ow up each incident to

ensure the appropriate action items are in place and prioritized.

Designing tools to help our entire engineering organization be as productive as possible.

Contribute to documentation and uplifting of partner teams

Qualifications required:

Bachelors degree or equivalent practical experience.

5 years hands-on experience in Site Reliability and Observability Engineering, debugging,

diagnosing and correcting errors and resolving high severity incidents

Commercial experience i n one of the following languages Python or Go.

Think about systems - edge cases, failure modes, behaviors, specific implementations.

Experience building solutions in distributed systems for high volume transactions and/or

developing support focused tooling.

Solid experience with cloud infrastructure and tooling (GCP, Kubernetes, CI/CD pipelines

, Terrafrom and Pulumi).

Experience working with microservices, and microserivces resilience domain

Experience working on various monitoring, and alerting tools

Or refer someone