Job Description:
Job Title: Platform Engineer Linux & Containers
Business Division: Cloud Services
Department: Cloud Advisory & Transformation
Supervision: (Numbers of subordinates reporting to the incumbent)
Direct:
0
Indirect:
0
JOB CONTENT
Role Objective
The role will encompass a multifaceted set of responsibilities, primarily revolving around the deployment, configuration, hardening, testing, and Level 3 (L3) support of critical infrastructure components. These responsibilities are pivotal in maintaining the reliability, security, and optimal performance of the systems under the purview of this role
Detailed Roles and Responsibilities:
Infrastructure Deployment and Management:
- Deploy and manage Linux-based infrastructure, including servers, virtual machines, and containers using scripts and tools (e.g. Terraform, ARM).
- Ensure the availability, scalability, and performance of Linux-based systems.
Configuration Management:
- Develop and maintain automated configuration management scripts and tools (e.g. Ansible, PowerShell DSC) to streamline system provisioning and configuration.
- Implement best practices for system configuration and standardization.
Security Hardening:
- Conduct security hardening of Linux systems and containers to protect against vulnerabilities and security threats.
- Monitor and remediate security vulnerabilities promptly.
Container Orchestration:
- Implement and manage container orchestration platforms like Kubernetes to support containerized applications.
- Ensure high availability, scalability, and reliability of containerized workloads.
Testing and Quality Assurance:
- Develop and execute testing strategies for Linux-based systems and containers, including performance testing, integration testing, and security testing.
- Identify and resolve issues through thorough testing and validation.
L3 Operations and Incident Response:
- Provide Level 3 support for Linux and container-related incidents and issues, including troubleshooting and resolution.
- Ensure timely response and adherence to Service Level Agreements (SLAs) for incident resolution, minimizing downtime and service disruptions.
- Conduct regular backups of critical data and systems, ensuring data integrity and availability in case of data loss or system failures.
- Implement and manage patch management processes to keep Linux systems and containers up to date with security patches and updates.
- Participate in incident response and disaster recovery activities.
Monitoring and Performance Optimization:
- Implement monitoring solutions (e.g., Azure Monitor, Prometheus, Grafana) to proactively identify performance bottlenecks and system issues.
- Optimize system performance and resource utilization.
Documentation and Knowledge Sharing:
- Maintain comprehensive documentation for system configurations, procedures, and troubleshooting guides.
- Share knowledge and expertise with team members and stakeholders.
Compliance and Best Practices:
- Ensure Linux systems and containers adhere to industry best practices and compliance standards (e.g., CIS benchmarks).
- Regularly audit and validate compliance.
Capacity Planning:
- Monitor resource utilization trends and perform capacity planning to accommodate future growth.
- Recommend infrastructure enhancements as needed.
Continuous Improvement:
- Stay updated on emerging technologies and best practices in Linux and containerization.
- Propose and implement improvements to enhance system efficiency, security, and reliability.
Collaboration and Communication:
- Collaborate with cross-functional teams, including developers, DevOps engineers, and system administrators, to support application deployment and operations.
- Communicate effectively with stakeholders to provide updates on system status and incident resolution.
- The candidate will also work with Architects on the Requirements and Design phases of solutions.
Assist in Technology Assessment and Selection:
- Collaborate in the assessment and selection of appropriate IT technology solutions to meet specified requirements.
- Evaluate proposals from equipment, software, and IT service providers to make informed decisions.
Ensure Quality and Standards:
- Ensure the use of suitable performance and diagnostic tools for software testing and changes evaluation.
- Maintain and improve installation procedures and standards within the designated area of responsibility.
- Uphold quality standards for internal and client-oriented documentation.
Technical Standards and Change Management:
- Develop technical standards and maintain a repository for deliverables, methodologies, and deployment documents.
- Lead change management efforts by preparing implementation plans, rollback plans, test plans, and conducting risk and impact analyses for critical or complex changes.
Kubernetes Cluster Management:
- Create and manage deployments, pods, services, and namespaces within Kubernetes clusters.
- Implement auto-scaling and load balancing for applications as needed.
- Collaborate with DevOps teams to integrate Kubernetes deployments into CI/CD pipelines.
Automation and Infrastructure as Code (IaC):
- Facilitate the deployment of applications to Kubernetes clusters through CI/CD workflows.
- Develop automation scripts using Kubernetes API and Azure CLI to streamline cluster management and deployment processes.
- Utilize Infrastructure as Code (IaC) tools like Terraform or ARM templates for efficient infrastructure provisioning.
The Consultant Linux Administration will also contribute to the design (both High Level and Low Level) under the guidance of the Architecture team.
KEY INTERACTIONS
Internal:
- Management
- Project Teams
- Architecture Team
- Engineering Team
- Developers
- Vendors and suppliers
- Customers
BEHAVIORAL SKILLS, KNOWLEDGE, AND EXPERIENCE
(Special behavioral skills, knowledge and experience needed for the satisfactory performance of the job)
Educational Qualifications:
- Diploma or Degree in Computer Science or Engineering
- Accreditations:
- o Linux certifications (Red Hat Certified Engineer) - mandatory
- CKA (Certified Kubernetes Administrator) advantageous
Skills & Experience:
- A minimum of 5 years of relevant IT experience, including a mandatory 3 years of hands-on experience in overseeing Cloud workloads, specifically with a focus on Red Hat (mandatory), and expertise in SLES (advantageous), AKS (mandatory), OpenShift (advantageous), and Containers.
- Experience of performance monitoring and capacity management, backup/restore operations, update management, hardening in Linux and AKS environments
- Experience in working in a senior technical support capacity (L3).
- Effective communication, facilitation and influencing skills - ability to present ideas clearly and concisely.
- Experience on transformation projects and successful transitions to implementation support teams.
- Knowledge and experience implementing Linux server services with best practices
- Experience in writing scripts in Bash (mandatory), Python and PowerShell
- Experience in IaC tools (Terraform - mandatory, ARM - advantageous)
- Experience in configuration management tools (Ansible mandatory, PowerShell DSC, Helm, Azure Policy)
- Strong analytical/troubleshooting skills
- IT Infrastructure deployment experience including troubleshooting, design, and implementation.
Behavioral Skills:
- Strong communication skills
- Ability to work under pressure, independently and prioritize with minimal supervision
- Multi-tasking skills and attention to detail
- Team player with ability to work with cross functional teams
- Strong communication skills
- Critical thinking skills
- Attention to detail
- Ability to juggle multiple, competing, frequently changing time-sensitive deadlines and priorities
The Job holder may be required to undertake additional duties, which may be reasonably expected and forms part of the function of the job.