DevOps Senior Lead (SRE Lead)


Job Description:

KEY ACCOUNTABILITIES (Including but not limited to) Infrastructure Management: Lead the design, implementation, and management of scalable, resilient infrastructure solutions. Ensure high availability, performance, and reliability of both production and nonproduction environments across cloud and on-premises systems. Manage the infrastructure lifecycle, including planning, provisioning, and decommissioning resources efficiently. CI/CD Pipeline Development: Oversee the development, maintenance, and optimization of CI/CD pipelines to automate application deployment and infrastructure provisioning. Ensure efficient integration of automated testing, continuous integration, and continuous deployment to facilitate rapid, reliability, and safe releases. Implement, maintain, and refine infrastructure-as-code practices for consistency and scalability. Monitoring & Incident Response: He will be responsible of building the team and get new hires if needed Lead the implementation and maintenance of monitoring, alerting, and logging systems to track application and infrastructure performance.

Respond to high-severity incidents, troubleshoot issues, conduct root cause analysis, and implement preventive measures to reduce recurrence. Continuously improve incident management processes to minimize downtime and ensure swift recovery. Automation & Scripting: Automate routine operational tasks and infrastructure provisioning using scripting languages (e.g., Python, Bash, PowerShell). Develop and maintain custom automation scripts for tasks related to deployment, scaling, and monitoring. Collaboration & Communication: Collaborate closely with development teams to integrate new features, services, and tools into the infrastructure. Work with security teams to establish and enforce best practices for infrastructure and application security. Communicate effectively with stakeholders and other departments, providing regular updates on system health, performance, and incident resolution. Performance Optimization: Lead efforts to continuously monitor, analyze, and optimize system performance, identifying and resolving inefficiencies or bottlenecks. Implement load testing, performance tuning, and system scaling strategies to ensure applications can meet user demand and business needs. Documentation: Create and maintain comprehensive documentation for infrastructure, deployment processes, operational procedures, and disaster recovery plans. Ensure that troubleshooting guides, best practices, and technical solutions are accessible, clear, and kept up-to-date. Security & Compliance: Lead initiatives to implement and maintain security best practices for infrastructure, deployment processes, and services. Ensure compliance with security policies, regulatory requirements, and internal standards. Perform regular security audits and vulnerability assessments to identify and mitigate potential risks

QUALIFICATIONS, EXPERIENCE, SKILLS Education:

Bachelor's degree in computer science or computer engineering. Minimum 5 years of experience.Human Resources Department Job Description Form Skills: Strong command of the English language, both written and verbal Proven experience with Linux administration (including server management and performance tuning). In-depth knowledge of database administration for technologies like Elasticsearch, MongoDB, and PostgreSQL. Extensive experience in scripting with Python, Bash, and/or PowerShell. Solid experience with cloud platforms (e.g., AWS, Azure, GCP) and infrastructure management tools (e.g., Terraform, Ansible). Experience in leading or mentoring a DevOps team, managing multiple simultaneous projects, and delegating tasks based on project criticality. Expertise in continuous integration, delivery pipelines, and version control (Git, GitHub, GitLab). Experience with web technologies like Django and Python is a plus. Familiarity with containerization (Docker, Kubernetes) and microservices architecture. Ability to work cross-functionally with diverse teams and drive collaboration between development, operations, and security teams. Strong communication and presentation skills, with the ability to explain complex technical concepts clearly to non-technical stakeholders. Attention to detail and a proactive approach to problem-solving.