Job Openings
Site Reliability Engineer (DevOps Engineer)
About the job Site Reliability Engineer (DevOps Engineer)
This is where you and your skills come in. Were currently looking for Site Reliability Engineer (DevOps Engineer) for our AI SRE team.
Responsibilities:
- Provide ongoing support for production environments, troubleshooting, manage and resolve incidents, perform RCA, facilitate blameless post-mortems
- Actively contribute to production stability improvements (processes improvements)
- Do care about monitoring, alerting and logging, capacity management
- Implement CI/CD related pipelines and automations
- Understand the architecture of our services and products
- Design automated software and product upgrades, change management, and release management solutions
- Interact with our internal customers - mostly, Developers/QA and OPS/SRE
- Work with teams responsible for Infrastructure, Networking, Applications Engineering, Information Security
- Continuously improve and share knowledge of system, update documentation
- On-Call reliable Rotations and Schedules
Requirements:
- Solid knowledge and strong experience in production support activities
- Understanding of SRE principles and DevOPS practices
- Troubleshooting process understanding and experience
- Experience as a Linux System Administrator at least 2-3 years
- Key Skills for Kubernetes (K8s) DevOps: understanding of networking, security, storage is critical
- AWS cloud experience (IAM, VPC, R53, AZs, EC2/EKS, RDS, S3, CloudFront, CloudWatch)
- Understanding of Real-time Data Streaming (Kafka)
- Understanding the Basic Concepts of Elasticsearch (Node, Cluster, Index, Document, Shard, Replicas)
- RDBMS administration experience (PostgreSQL/AuroraDB preferably) or MySQL
- Experience working with GIT, Prometheus, Grafana, Terraform, Helm
- Knowledge of CI/CD tools and ability to implement deployment activities automation
- Familiar with application and service monitoring tools and techniques
- Effective communication skills (Active listening, Friendliness, Confidence, Sharing feedback, Respect)
- English - Intermediate (B1)
Personal skills:
- Team player
- Fast learner
- Documentation culture
Will be a strong plus:
- System thinking approach
- Expertise automating system administration tasks with configuration management tools
- Real automation experience (Python, Bash, Golang)
- MS Azure/GCP cloud experience
- Flux + Kustomize/Flagger/Strimzi + Istio
- Experience with MongoDB
- Experience in ELK stack usage
- Web-service administration experience: Nginx
What we offer:
- Well-coordinated professional team.
- Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth.
- Additional Health and Life Insurance Package.
- Employee Assistance Program.
- 25 vacation days.