Ofertas de empleo Site Reliability Engineer

Acerca del puesto Site Reliability Engineer

Join CHUBB IT: Site Reliability Engineer (SRE)

Chubb, a leading global insurance company with a presence in more than 54 countries, is expanding its Engineering Center in Bogotá. This strategic technology hub is focused on driving digital transformation and delivering innovative solutions that support business operations across North America and LATAM.

We are seeking a hands-on Site Reliability Engineer (SRE) to join our LATAM/North America Platforms Team. In this role, you will help ensure the reliability, scalability, and operational excellence of critical enterprise applications while collaborating closely with developers, infrastructure teams, and business stakeholders.

Why Join Us?

  • Be part of a global organization working on high-impact, enterprise-scale technology initiatives.
  • Grow your career in one of the largest and fastest-growing technology hubs in Latin America.
  • Work in a collaborative and innovative environment with regional and global teams.
  • Gain exposure to modern observability, automation, and cloud technologies.
  • Access continuous learning opportunities, certifications, and career development programs.

What You'll Do

Hands-On Engineering & Troubleshooting

  • Troubleshoot and enhance the reliability and performance of critical production applications.
  • Diagnose and resolve complex application and infrastructure issues, including root cause analysis (RCA) and performance debugging.
  • Program and debug .NET applications to support operational stability and reliability goals.

Monitoring, Observability & Analysis

  • Maintain, aggregate, and analyze logs using observability and monitoring tools such as ELK Stack, Splunk, Kibana, Application Insights, AppDynamics, and Dynatrace.
  • Monitor system health, identify performance bottlenecks, and improve user experience.

Automation & Operational Efficiency

  • Develop and maintain automation scripts using PowerShell and Python.
  • Identify repetitive operational tasks and implement automation solutions to improve efficiency and reduce manual effort.
  • Recommend and implement tooling and process improvements aligned with SRE best practices.

Incident Response & Reliability

  • Participate in incident management, postmortems, and preventative reliability initiatives.
  • Support production systems and ensure high availability and operational excellence.
  • Apply SRE principles including SLAs, SLOs, error budgets, and reliability-focused system design.

Infrastructure & Cloud Support

  • Perform IIS administration and support Windows-based environments.
  • Support applications hosted in both on-premises and Azure cloud environments.
  • Assist with environment setup and maintenance while collaborating with infrastructure teams as needed.

Collaboration & Mentorship

  • Work effectively across technical and business teams to resolve issues and improve system performance.
  • Share knowledge with junior engineers and promote a culture of reliability and continuous improvement.
  • Collaborate with global teams to align with enterprise standards and operational best practices.

What We're Looking For

Required Qualifications

  • Bachelor's degree in Computer Science, Engineering, or a related discipline.
  • 5+ years of hands-on experience in Site Reliability Engineering, Production Support Engineering, or Application Support roles.
  • Strong experience with PowerShell scripting, Python programming, and .NET application debugging.
  • Experience troubleshooting complex production issues and conducting root cause analysis.
  • Familiarity with monitoring and observability platforms such as ELK Stack, Splunk, Kibana, AppDynamics, Dynatrace, or Application Insights.
  • Basic to intermediate knowledge of MS SQL Server and database troubleshooting.
  • Strong analytical, organizational, and communication skills.
  • Ability to manage multiple priorities and perform effectively in high-pressure production environments.

Nice to Have

  • Experience with Azure cloud infrastructure and services.
  • Familiarity with AI technologies and their use in operational efficiency and reliability engineering.
  • Exposure to infrastructure-focused SRE environments.
  • Knowledge of chaos engineering, fault injection, or performance optimization practices.
  • Experience working within regulated or highly compliant industries.

What Success Looks Like

  • Rapid and effective resolution of incidents and production issues.
  • High availability and reliability of critical enterprise applications.
  • Continuous improvement in automation, monitoring, and operational efficiency.
  • Strong collaboration and technical leadership within the engineering team.

What You'll Get

  • Hybrid work model based in Bogotá.
  • Competitive salary and comprehensive health benefits.
  • Continuous learning through training programs and professional certifications.
  • Wellness initiatives and a supportive, inclusive workplace culture.
  • Opportunity to work on global initiatives with international teams.

Ready to Take Your Career Global?

Join Chubb and help build the future of technology and reliability engineering across our global platforms. Apply now and become part of a world-class engineering organization.