Production Support/Management IO TECH SOLUTIONS LIMITED

Hong Kong Island, Hong Kong

Production Support/Management

Job Description:

About the Role:

We are seeking a highly motivated Production Support Engineer with 2+ years of experience to ensure the continuous and efficient operation of our production systems. In this role, you will be responsible for monitoring, troubleshooting, and resolving production issues in real-time, as well as improving the overall stability and performance of our services.

You will work closely with development, QA, and operations teams to address incidents, identify root causes, and implement long-term solutions. If you thrive in high-pressure environments and enjoy problem-solving, this could be a perfect fit for you.

Key Responsibilities:

Monitor the health, performance, and availability of production systems and services

Diagnose and resolve production issues quickly, minimizing downtime and impact on end-users
Provide on-call support for production incidents and manage issue escalation as necessary
Collaborate with development teams to investigate root causes of production issues and propose solutions
Perform system health checks and regular system maintenance tasks to ensure optimal performance
Implement monitoring tools and alerting systems to proactively identify potential issues before they impact users
Deploy bug fixes, patches, and system upgrades in production environments
Document issues, resolution steps, and operational procedures for knowledge sharing
Assist in post-incident reviews and implement improvements based on lessons learned
Help implement change management processes to ensure smooth and controlled deployments
Ensure adherence to SLAs (Service Level Agreements) for incident resolution and response time

Qualifications:

Required:

Bachelors degree in Computer Science, Information Technology, Engineering, or a related field
2+ years of experience in production support or operations management in a tech environment
Familiarity with Linux/Unix or Windows server administration
Strong experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios, New Relic)
Ability to work with log aggregation and analysis tools (e.g., ELK Stack, Splunk)
Proficiency in troubleshooting application, infrastructure, and network issues
Experience with databases (e.g., MySQL, PostgreSQL, MongoDB)
Knowledge of incident management tools (e.g., JIRA, ServiceNow)
Strong understanding of cloud platforms (e.g., AWS, Azure, GCP) and cloud infrastructure
Familiarity with CI/CD pipelines and deployment automation tools

Preferred:

Experience in automation and scripting (e.g., Bash, Python, Shell scripting)
Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes
Experience in load balancing, scaling, and disaster recovery practices
Knowledge of ITIL or other IT operations frameworks
Experience in release management and deployment strategies

Required Skills:

Production Support Gcp Operations Incident Management Disaster Recovery CI/CD Shell Scripting Analysis Splunk Steps Escalation Pipelines Lessons ROOT ServiceNow Azure ITIL Bash Load Checks Windows Server Operations Management Unix AWS Reviews Change Management Kubernetes Infrastructure Availability Automation PostgreSQL Information Technology MongoDB Databases Docker Linux Computer Science Troubleshooting Windows Administration JIRA MySQL Maintenance Engineering Python Science Management