Job Openings Database Reliability Engineer

About the job Database Reliability Engineer

We are looking for Database Reliability Engineers (DBREs) who are responsible for keeping our database systems running smoothly 24/7/365. DBREs build tools, design and implement services, and improve the performance and reliability of our database systems as we rapidly scale our product and organization. DBREs will play a highly visible role leading projects for storage capacity forecasting and planning, efficient data backup strategies, and optimizing our sharding approach. Additionally, DBRE’s will keep an ever-watchful eye on our databases capacity and performance.

DBREs will partner with other software developers to understand data access patterns and tune our database systems for optimal performance, reliability, and availability. Therefore, DBREs are peers to SREs and bring database expertise to our Infrastructure Team and engineering teams.

You’ll have the opportunity to manage the complex challenges of scale which are unique to iPrice, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.

If you are a blend of database engineering, administration gearheads and software crafters that apply sound engineering principles, operational discipline and mature automation, specializing in databases, then you are the right person!


RESPONSIBILITY:

  • Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
  • Collaborate with engineering teams on their database storage needs, and advise them throughout the development lifecycle.
  • Maintain databases by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
  • Practice sustainable incident response and blameless post-mortems.
  • Support and debug database production issues across services and levels of the stack.
  • Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.
  • Analyse solutions and implement best practices for our database platforms:
    • Elasticsearch database cluster and its components
    • Cassandra, Postgre, and SQL Azure
    • Other SQL and NoSQL databases.
  • Work with peer SREs to roll out changes to our production environment and help mitigate database-related production incidents.
  • Provide database expertise to engineering teams (for example through reviews of database design, queries and performance optimizations).

Requirements

  • A Bachelor's Degree/Diploma in Computer Science, Information Technology or a related subject.
  • Have been working in Database or Site Reliability Engineering, with increasing responsibilities for 5+ years.
  • Operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
  • Professional experience using Python, Go, or Ruby.
  • Strong familiarity with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
  • Possess experience with cloud environments – AWS, GCP or Azure.
  • Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
  • A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
  • Ability to identify problems before they happen and implement solutions that detect and prevent outages.
  • Expertise in designing, analysing and troubleshooting large-scale distributed systems.
  • Ability to debug, optimize code and automate routine tasks.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
  • Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.