Devops/Sre IO TECH SOLUTIONS LIMITED

Hong Kong Island, Hong Kong

Devops/Sre

Job Description:

About the Role:

We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2+ years of experience to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps/SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the reliability of our services.

This is a hands-on role that requires strong technical skills, a deep understanding of modern DevOps tools and practices, and a problem-solving mindset.

Key Responsibilities:

Design, implement, and maintain CI/CD pipelines for reliable code deployment
Monitor application performance and system reliability using tools like Prometheus, Grafana, or Datadog
Maintain and improve cloud infrastructure (e.g., AWS, GCP, Azure) following best practices
Manage infrastructure as code using tools such as Terraform, Ansible, or CloudFormation
Troubleshoot infrastructure and application issues, ensuring minimal downtime and fast resolution
Automate repetitive operational tasks and improve development workflows
Implement and enforce security, backup, and disaster recovery strategies
Participate in on-call rotation and respond to incidents with root cause analysis and postmortem reviews
Work closely with development teams to ensure applications are designed for performance, availability, and scalability
Optimize resource usage and costs across cloud environments

Qualifications:

Required:

Bachelors degree in Computer Science, Engineering, or a related field
2+ years of experience in a DevOps, SRE, or Systems Engineering role
Hands-on experience with Linux/Unix system administration
Experience with CI/CD tools such as Jenkins, GitHub Actions, CircleCI, or GitLab CI
Working knowledge of cloud platforms (AWS, GCP, Azure)
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
Experience with infrastructure as code tools like Terraform, Ansible, or similar
Proficient in at least one scripting or programming language (e.g., Bash, Python, Go)
Strong understanding of monitoring, logging, and alerting systems
Version control with Git

Preferred:

Experience with Kubernetes administration in production environments
Familiarity with security best practices and compliance standards
Understanding of networking, load balancing, and DNS configurations
Exposure to incident management and SLA/SLO/SLI concepts
Experience working in Agile environments

Required Skills:

Gcp Operations Root Cause Analysis Incident Management Compliance System Administration Disaster Recovery CI/CD Analysis Pipelines ROOT Ansible Azure Bash Version Control Agile Load Scalability Gitlab Unix AWS Reviews DevOps Reliability Kubernetes Infrastructure Availability Jenkins Networking Programming Github Docker Linux Computer Science Security Administration Git Design Engineering Python Science Management