Job Openings
JR-108091 Site Reliability Engineer WQ
About the job JR-108091 Site Reliability Engineer WQ
Job (Project) Description:
We are looking for a Site Reliability Engineer (SRE) to join our team and help design, build, and maintain scalable, reliable infrastructure and operational processes. This role involves managing infrastructure as code, implementing monitoring and alerting systems, and supporting our production environment to ensure high availability and performance.
Locations:
- Mexico
- Portugal
- Spain
Requirements:
- Strong experience with Linux systems administration (advanced user level);
- Hands-on experience with Infrastructure as Code (Terraform);
- Experience with CI/CD practices and tools;
- Proficiency with Docker and Kubernetes for container orchestration;
- Experience with monitoring and alerting systems (Prometheus, Grafana, Elastic Stack);
- Familiarity with operational practices such as runbooks and on-duty/on-call support
- Experience with RDBMS, knowledge of DML and DDL.
- Python skills at mid level for scripting or automation. Python development background preferred.
- Familiarity with Kafka for event streaming;
Nice to have
- Basic knowledge of Apache Spark (understanding how it works);
- Familiarity with NoSQL databases like Redis, MongoDB, etc.
Other skills:
- English excellent written and verbal communication skills;
- Ability to work in a global multinational company;
- Good communication skills;
- Ability to lead conversations with both technical and business representatives;
- Proven ability to work both independently and as a part of an international project team.
Job Responsibilities:
- Design, build, and maintain infrastructure using Infrastructure as Code (IaC) tools such as Terraform;
- Implement and manage CI/CD pipelines for application and infrastructure deployment;
- Manage containerized workloads with Docker and Kubernetes (K8s);
- Monitor, troubleshoot, and optimize Linux systems (CPU, processes, I/O, logs);
- Set up and maintain logs, monitoring, and alerting systems using Prometheus, Grafana, Elastic Stack, or similar tools;
- Maintain and improve runbooks for operational support and on-call duties;
- Collaborate with development and operations teams to ensure system reliability, scalability, and security.
What We Offer:
- Benefits will be shared during the initial call.