About the job Site Reliability Senior Engineer
A. PROFILE
Role Title: Site Reliability Senior Engineer
Reporting to: Engineering Manager - DevOps
Division: Information & Communication Technology
Department / Section: Technology & Information
B. CONTEXT
Purpose: This role is responsible for contributing in the planning team of ICT. This includes strategic planning, solutions roadmaps, capacity planning, and Innovation.
Context: The Technology Unit within OM is the backbone of the organization providing all technology services which enable OM to deliver its services to its customers across all technology platforms, 24/7/365. The quality of the customer experience sits within this BU and therefore it plays a significant role in the delivery of revenue and satisfaction targets.
ICT Planning plays a vital role in this context by ensuring that ICT systems fulfill demand needs, and that ICT strategy is aligned with U9 business strategy and vision.
C. ROLE ACCOUNTABILITIES
- Lead the design, development, and maintenance of a robust and efficient DevOps pipeline to enable continuous integration and delivery of software products.
- Configure and manage automation tools such as Ansible to streamline deployment and configuration management processes.
- Containerize applications using Docker and orchestration tools to enable scalability and portability.
- Maintain and enhance version control systems, primarily Git, to ensure smooth code collaboration and version control.
- Plan and implement integration with multiple third-party systems such as infra, core, ICT, public cloud etc.
- Develop and maintain microservices using Python, adhering to best practices and coding standards.
- Utilize the expertise in Oracle Linux and SQL to optimize database performance and troubleshoot issues.
- Collaborate closely with software developers, providing on-time support and deploying micro-service solutions in the IT environment.
- Plan and scale for multiple applications, ensuring efficient development, maintenance, and performance tuning
- Monitor system performance, analyse metrics, and implement proactive measures to ensure high availability and scalability.
- Conduct application performance analysis and reporting for environment-related matters.
- Participate in incident management and root cause analysis, identifying and resolving issues to minimize downtime and improve system reliability.
- Work with industry collaborators or research institutes for the potential new business stream for automation, process efficiency and so on.
- Undertake any other related or ancillary duties and responsibilities assigned based on U9 business and operational needs.
D. KEY PERFORMANCE INDICATORS
- Time to market for IT application environment
- Scalability of IT application which will be elastic to scale up and down
- Seamless runtime for IT application >=99% after application go live
- ALL system to be update with latest security patches
E. WORKING RELATIONSHIPS & DECISION MAKING
Interacts with:
Internal:
- Infrastructure team, IT/Network team
- Software development team
- ICT demand team
- ICT Operation team
External:
- Infrastructure vendor
- Security Vendor
Decision Making
- Impact analysis approval
- Solution design approval
- Security path and assurance approval
F. EXPERIENCE AND QUALIFICATIONS
Minimum Experience & Essential Knowledge
- Proven knowledge in translating business requirements into operating technologies
- 3 to 5 years of relevant experience in telecom industry.
- Good experience in system administration.
Minimum Entry Qualifications
Bachelors Degree in Telecoms engineering, Computer Science or equivalent