Job Openings MS Observability Engineer

About the job MS Observability Engineer

About Company:

Founded in Paris in 2014, Ledger is a global platform for digital assets and Web3. Ledger is the world leader in critical digital asset security and utility. With more than 7 million devices sold to consumers in 180+ countries, 20% of the worlds crypto assets are secured by Ledger devices.
600+ professionals are working in distributed teams across the globe with head office in Paris. Moreover, the company supports and endorses the collaboration of blockchain and crypto companies as well as enthusiasts all over the world.

About technologies and products:

Thanks to products for individuals combined with Ledger Live, you can safely manage your digital assets and explore a growing number of Web3 apps through your Discover section.
Ledger Enterprise, our branch dedicated to businesses, enables exchanges and corporations to leverage their digital value with the proper governance and security frameworks.
We use a secure element chip to safeguard digital assets for millions of individuals and hundreds of enterprises. Security is and will remain at the heart of our products. The Ledger Donjon team is made up of world-class experts continuously stress-testing our own solutions and looking for vulnerabilities.Today, we are primarily known for being a secure gateway to crypto and NFTs. But as the Internet of Value goes mainstream, our devices will allow you to manage an ever-expanding range of assets, including your value, identity, data, stocks, and more.


Responsibilities:

  • Maintain observability tools such as Datadog and Prometheus-based stack (optimize costs, add integrations, onboard new apps)
  • Design, implement, and maintain advanced monitoring, alerting, and reporting capabilities (e.g. health, performance, availability of hosts and company services)
  • Develop scripts (mostly on Python, Bash) and automation tools to streamline observability tools and reporting management tasks.
  • Adapt open-source monitoring tools related to monitoring and CI/CD pipelines
  • Develop custom Prometheus exporters
  • Collaborate with cross-functional teams to provide insights into application behavior and performance in order to resolve complex issues, ensuring minimal disruption to business operations.
  • Participate in knowledge sharing and documentation:
    • Collect feedback from engineering and operations teams to drive continuous innovation and improvement in the observability product area.
    • Develop best practices for observability and share your knowledge with other team members through workshops and detailed documentation, promoting a culture of observability and continuous improvement.


Qualifications and skills:

  • 3+ years of relevant work experience
  • Understanding of SRE, DevOps, and GitOps principles
  • Understanding of basic security practices
  • Experience in cloud platforms (AWS: RDS, EKS, IAM, Cloudwatch)
  • Experience in tools like Github CI/CD, Terraform, Ansible or Saltstack
  • Expertise in monitoring and observability (Prometheus, Alertmanager, or similar solutions)
  • Expertise in Linux, Container orchestration (Docker, Kubernetes)
  • Experience in diagnosing performance bottlenecks and other system issues using system tools as well as observability tools
  • Experience in network monitoring, performance troubleshooting, and capacity planning
  • Understanding of how databases function and experience in writing queries to retrieve data
  • Effective written and verbal communication skills in English


Will be a plus:

  • Understanding of blockchain technologies
  • Understanding of CDN tools (Cloudflare)
  • Experience in configuring, managing, and troubleshooting basic Datadog tools such as logs, metrics, and traces for private & public cloud monitoring
  • Experience in designing, implementing, and maintaining alerts based on requirements defined by engineering; improving alerting to minimize false positives and false negatives
  • Experience in developing and executing proof-of-concept projects to evaluate new solutions for potential adoption.