Job Openings Site Reliability Engineer

About the job Site Reliability Engineer

Job Description

As a Site Reliability Engineer (SRE) at Solutions Exchange Inc, you will play a critical role in ensuring the reliability, scalability, and performance of our production systems. You will collaborate closely with software engineering and operations teams to build and maintain tools for automation, monitoring, and operations. Your expertise will be crucial in designing resilient and scalable architectures, optimizing application performance, and resolving complex technical issues to deliver a seamless user experience.

Responsibilities

  • Design, build, and maintain tools and frameworks for deployment, monitoring, and operations.
  • Implement best practices in infrastructure security, scalability, and reliability.
  • Collaborate with cross-functional teams to define and achieve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Perform system and application troubleshooting to resolve issues and ensure optimal performance.
  • Design and implement automation strategies to streamline operations and reduce manual intervention.
  • Participate in on-call rotation and respond to incidents to minimize downtime and impact on users.
  • Conduct post-mortem analyses of incidents and implement measures to prevent recurrence.
  • Continuously evaluate and improve our systems and processes to enhance reliability and efficiency.

Requirements

  • Bachelor's degree in computer science, Engineering, or a related technical field, or equivalent practical experience.
  • Proven experience in a Site Reliability Engineer or similar role, with a focus on designing and implementing scalable systems.
  • Strong proficiency in programming languages, scripting and automation (Java, ReactJS, etc.).
  • Experience with cloud platforms such as AWS, Azure, or GCP, and container orchestration tools like Kubernetes.
  • Deep understanding of networking, system administration, Windows, and Linux/Unix-based environments.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Strong communication skills and the ability to work effectively in a collaborative team environment and to stakeholders

Preferred Qualifications:

  •  Master's degree in computer science, Engineering, or a related technical field. 
  • Certification in cloud platforms or DevOps methodologies (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer).
  • Experience with CI/CD pipelines and configuration management tools (e.g., Ansible).
  • Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK stack, etc.
  • Experience with Agile/Scrum methodologies and practices.

Work Setup:

Shift: Dayshift

Setup: Hybrid

Location: Makati

By Applying, you give consent to collect, store, and/or process personal and/or sensitive information for the purpose of recruitment and employment may it be internal to Cobden & Carter International and/or to its clients.