Job Openings Site Reliability Engineer - Infrastructure

About the job Site Reliability Engineer - Infrastructure

Our client is an international tech consulting company with +25 years of experience offering solutions to support companies' businesses and digital transformation.

Tasks and Responsibilities:


Infrastructure Delivery:

  • Design, implement, and maintain scalable, secure, and highly available infrastructure, primarily in Azure.
  • Manage Kubernetes clusters, ensuring optimal performance, scalability, and security.
  • Apply and administer Infrastructure as Code (IaC) solutions using Terraform.
  • Develop and optimize CI/CD pipelines to enable efficient infrastructure delivery.

Observability:

  • Set up and maintain monitoring, logging, and alerting systems to ensure infrastructure reliability and quick incident resolution.
  • Utilize tools like Datadog (or similar) to enhance system visibility.
  • Work closely with development teams to define and track Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.

Collaboration with Development Teams:

  • Collaborate with development teams to ensure the infrastructure supports application requirements and performance needs.
  • Provide technical guidance on best practices for deployment, scalability, and reliability.
  • Act as a bridge between operations and development, fostering a DevOps culture.

Requirements:

Education:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

Experience:

  • At least 3 years of experience in Site Reliability Engineering (SRE) roles.

Certifications:

  • Kubernetes (CKA) | Azure Cloud (AZ-900).

Technical Skills:

  • Strong expertise in Azure services, including networking, storage, and computing.
  • Hands-on experience with Kubernetes (preferably AKS), including cluster management and troubleshooting.
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform.
  • Solid understanding of observability principles and experience with tools like Datadog.
  • Advanced scripting skills (Bash, Python, or PowerShell).