About the job Site & Reliability Engineer
Company Overview
We are a consulting company with a bunch of technology-interested and happy people!
We love technology, we love design and we love quality. Our diversity makes us unique and creates an inclusive and welcoming workplace where each individual is highly valued.
With us, each individual is her/himself and respects others for who they are and we believe that when a fantastic mix of people gather and share their knowledge, experiences and ideas, we can help our customers on a completely different level.
We are looking for you who want to grow with us!
With us, you have great opportunities to take real steps in your career and the opportunity to take great responsibility.
Job description:
- Develop, test, and maintain high-quality software solutions, frameworks and automations.
- Collaborate with cross-functional teams to analyse requirements and design solutions around stability and reliability.
- Participate in code reviews to ensure code quality and shared knowledge.
Identify, troubleshoot, and resolve various incidents, problems Ensure DevOps/SRE best practices. - Contribute to continuous improvement initiatives within the engineering team.
- Proficiency in one or more programming /scripting languages such as Python.
- Solid understanding of Agile development methodologies.
- Willingness to work with operations and incident, problem management.
Good knowledge of at least one of the three big cloud service providers: Microsoft Azure or GCP. - Experience in building CI/CD workflows using GitHub Actions.
Experience in Observability setup (Application, Infra) using tools such as Splunk, Grafana, etc. - Familiarity with version control systems such as Git.
- Good problem-solving skills and eagerness to learn.
- Excellent communication and teamwork skills
- Infrastructure Management: Design, build, and maintain scalable and reliable infrastructure. Optimize system performance and reliability by managing cloud or on-premises infrastructure.
- Incident Management: Lead incident response efforts to diagnose and resolve critical issues. Participate in the on-call rotation and develop runbooks for incident response.
- Automation and DevOps: Develop and implement automation tools and frameworks to reduce manual tasks and enhance system reliability. Advocate for DevOps best practices within the engineering team and implement CI/CD workflows.
- Performance Optimization: Analyze system performance metrics to identify bottlenecks and optimize system performance. Implement monitoring and alerting solutions to detect and resolve issues proactively.
- Security and Compliance: Ensure systems are secure and compliant with industry standards. Conduct security assessments and work with security teams to implement necessary controls.
- Continuous Improvement: Identify opportunities for process improvements and implement best practices for system reliability and performance. Collaborate with software engineers to enhance the reliability and availability of applications and services.
- Documentation and Knowledge Sharing: Create and maintain comprehensive documentation of systems, processes, and procedures. Share knowledge and mentor junior team members.
- Observability: Develop monitoring and alerting setup based on Service Levels (SLI/SLO) for Application and Infrastructure.
Required cloud certification: Azure900
Start: Immediate
Location: Bangalore, India
Form of employment: Full-time until further notice, we apply 6 months probationary employment.
We interview candidates on an ongoing basis, do not wait to submit your application.