Job Openings
DevOps / Site Reliability Engineer
About the job DevOps / Site Reliability Engineer
Job Location: remote in Romania
Recruitment process:
- HR discussion
- Technical discussion
Role description:
We have a new opportunity for a seasoned Site Reliability Engineer who will work alongside the development, architecture and service management teams.
This role is instrumental in bridging the gap between development and operations, applying engineering principles to operational challenges to drive continuous improvement and innovation.
Responsibilities:
- Infrastructure Management: Design, build, and maintain scalable and resilient AWS cloud infrastructure.
- Reliability and Performance: Implement monitoring, alerting, and remediation strategies to maintain system health and performance.
- Automation: Create and manage CI/CD pipelines, automate routine tasks using Infrastructure as Code (IaC) tools such as Terraform and CloudFormation.
- Incident Management: Proactively monitor and respond to system reliability issues, ensuring high availability and reducing downtime.
- Collaboration: Work with development, security, and operations teams to ensure seamless integration and operational reliability of applications.
- Security: Ensure best practices in security and compliance, addressing vulnerabilities in AWS environments.
- Cost Optimization: Analyze and optimize AWS resource usage to balance performance, scalability, and cost.
- Documentation: Write and maintain technical documentation, including architecture diagrams, runbooks, and incident reports.
Profile:
- Knowledge and hands-on experience with cloud platforms and Infrastructure as a Service (IaaS) offerings, preferably Amazon Web Services (AWS) or Microsoft Azure.
- AWS certification (e.g. AWS Solutions Architect Associate or Professional) or other industry certification is beneficial.
- Has significant experience in DevOps, SRE implementation and in evolving practices and ways of working through multi-disciplinary teams, business frameworks and culture.
- AWS Expertise: Strong hands-on experience with AWS services, including EC2, S3, RDS, Lambda, VPC, IAM, and CloudWatch
- Automation Tools: Proficiency in Infrastructure as Code (IaC) tools like Terraform, CloudFormation, and scripting languages such as Python, Bash, or similar.
- CI/CD Pipelines: Experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline.
- Strong and proven Java skills
- Linux and networking fundamentals
- Experience of containerisation, ideally using Docker, Kubernetes
- Expertise in observability principles and practices, encompassing monitoring, logging, tracing, and alerting systems to ensure transparency and actionable insights into system performance and health. Tools Dynatrace, Datadog
- Familiarity with addressing performance and optimization issues, with a demonstrated capability in diagnosing and resolving such problems efficiently.
- Experience across the entire stack: hardware, application and network