Job Openings AI Ops Engineer

About the job AI Ops Engineer

Job Overview:

The AlOps Engineer is responsible for integrating machine learning and advanced analytics into our existing monitoring and logging systems. This role will leverage artificial intelligence to automate and implement phased improvements to achieve operational excellence. Detect anomalies proactively, and implement self-healing frameworks to enhance the stability and performance of our infrastructure. The ideal candidate wil be proactive in identifying gaps and help in solutions.

Key Responsibilities:

  • Apply machine learning algorithms to existing operational data (logs, metrics, events) to predict system failures and proactively address potential incidents.
  • Implement automation for routine DevOps practices including automated scaling, resource optimization, and controlled restarts.
  • Develop and maintain self-healing systems to reduce manual intervention and enhance system reliability.
  • Build anomaly detection models to quickly identify and address unusual operational patterns.
  • Collaborate closely with SREs, developers, and infrastructure teams to continuously enhance the operational stability and performance of the system.
  • Provide insights and improvements through visualizations and reports leveraging Al-driven analytics.
  • Create a phased roadmap to incrementally enhance operational capabilities and align with strategic business goals.

Required Skills and Qualifications:

  • Strong experience with Al/ML frameworks and tools (e.g., TensorFlow, PyTorch, scikit-learn).
  • Proficiency in data processing and analytics tools (e.g., Splunk, Prometheus, Grafana, ELK stack).
  • Solid background in scripting and automation (Python, Bash, Ansible, etc.).
  • Experience with cloud environments and infrastructure automation.
  • Proven track record in implementing proactive monitoring, anomaly detection, and self-healing techniques.
  • Excellent analytical, problem-solving, and strategic planning skills.
  • Strong communication skills and the ability to effectively collaborate across teams.

Preferred Experience:

  • Background in DevOps/Site Reliability Engineering.
  • Familiarity with containerization and orchestration platforms (Kubernetes, Docker).
  • Experience in building scalable, distributed systems.
  • This role is pivotal in enabling our organization to achieve and sustain Operational Excellence through intelligent automation and proactive monitoring practices.

Package Details

The following optional perks are available.

Healthy Living:

Employees are encouraged to participate in programs that will provide physical exercise, stress reduction, and help improve and develop corporate camaraderie & culture. Programs vary by location but include: convenient and affordable on-site fitness classes, sports leagues, on-site weight loss programs and support, biometric screenings, chair massages, and walking & healthy living challenges.

Medical & Prescription Coverage:

Our Medical coverage helps employees maintain their well-being through preventive care and access to an extensive network of providers, as well as affordable prescription medication. Employees can choose from several plan types, including PPO and HDHP plans.

Dental Coverage:

Routine preventative care such as regular Dental checkups can help lower the risk of stroke and heart disease. Highbrow Dental coverage provides employees and their family affordable options for overall health. Plan options include PPO and HMO options.

Vision Coverage:

Highbrow offers a comprehensive Vision benefit option to ensure employees and their family have access to quality Vision care. Vision Coverage:

Flexible Spending Accounts:

Flexible Spending Accounts allow employees to set aside pre-tax deductions to pay for out-of-pocket health care expenses such as deductibles, copays and coinsurance, as well as dependent care expenses.

Life/AD&D Insurance:

Life and AD&D benefits are essential to the financial security of employees and their family. Basic Life & AD&D benefits are provided at no cost to employees. Employees may purchase Voluntary Life insurance for themselves and their families.

Income Protection:

Short Term Disability (STD) benefits are available to employees on a voluntary basis. STD insurance protects a portion of the employee’s income if he/she becomes partially or totally disabled for a short period of time. Long Term Disability (LTD) benefits are available to employees on a voluntary basis. LTD insurance protects a portion of the employee’s income if he/she becomes partially or totally disabled for an extended period of time

401(K):

The 401(k) Retirement Savings Plan is an excellent way to invest for the future. The Highbrow 401(k) plan provides employees with the tools and flexibility they need to retire comfortably and securely.