Job Openings AI Automation Test Lead

About the job AI Automation Test Lead

We are seeking an AI Automation Test Lead to drive the quality strategy for our clients AI/LLM-powered products. You will build and scale test automation frameworks that validate not just functionality, but also model behavior, data quality, and non-deterministic outputs. This role combines deep test automation expertise with hands-on experience in AI/ML systems, leading a team to ensure our AI features are reliable, safe, and performant in production.

Key Responsibilities

1. Test Strategy & Leadership

  • Define and own the end-to-end QA strategy for AI/LLM products including chatbots, agents, RAG pipelines, recommendation, and CV/NLP models
  • Establish best practices for testing non-deterministic systems: prompt evaluation, hallucination detection, bias/safety testing, latency & cost regression
  • Lead, mentor, and grow a team of SDETs/automation engineers. Set quality gates for CI/CD and release readiness
  • Partner with Product, Data Science, and Engineering to shift-left quality and define acceptance criteria for AI features

2. AI Test Automation Architecture

  • Architect automation frameworks for AI systems: prompt regression suites, golden dataset evaluation, synthetic data generation, LLM-as-judge pipelines
  • Build tooling to test RAG quality: context relevance, grounding, citation accuracy, retrieval latency
  • Automate testing of model APIs, vector DBs, embedding pipelines, and fine-tuning workflows
  • Implement eval harnesses using frameworks like DeepEval, RAGAS, LangSmith, Promptfoo, or custom solutions

3. Data & Model Quality

  • Design tests for data pipelines feeding AI: schema validation, drift detection, feature consistency between training/serving
  • Own offline/online eval pipelines. Track metrics: accuracy, faithfulness, toxicity, P50/P95 latency, token cost
  • Build canary & shadow testing for model deployments. Define rollback criteria based on guardrail violations

4. Traditional + AI System Testing

  • Drive API, UI, and integration test automation for services hosting AI models
  • Performance, load, and chaos testing for LLM inference endpoints and real-time features
  • Security testing for prompt injection, jailbreak, data leakage, and PII handling

5. Governance & Reporting

  • Create quality dashboards: model eval trends, defect leakage, flaky rate, coverage for AI scenarios
  • Drive root cause analysis for AI incidents. Feed learnings back into dataset curation and test design
  • Ensure compliance with AI safety, privacy, and regulatory requirements

Required Qualifications

  • 8+ years in software QA/test automation, with 2+ years leading teams
  • 3+ years hands-on testing AI/ML systems, LLMs, or data-intensive platforms
  • Strong coding in Python for test framework development. Java/Go is a plus
  • Experience with test automation: Pytest, Playwright, Selenium, REST/GraphQL, CI/CD with GitHub Actions, Jenkins
  • Deep understanding of LLM/RAG concepts: prompts, embeddings, vector DBs, chunking, eval metrics
  • Hands-on with Flink/Spark, SQL, Hive for validating data pipelines
  • Experience with cloud + K8s: AWS/GCP, Docker, Kubernetes, model serving on GPU/CPU
  • Built eval pipelines using LangSmith, Langfuse, Weights & Biases, MLflow, or similar
  • Strong grasp of statistics for A/B testing, significance, and measuring non-deterministic systems

Preferred Qualifications

  • Prior experience testing multi-agent systems, tool use, function calling
  • Knowledge of red-teaming, AI safety evals, bias/fairness testing
  • Contributions to open-source AI eval or testing frameworks
  • Experience with Doris, ClickHouse, Elasticsearch, Druid for test data analysis
  • Background in FinTech, E-commerce, or Search domains with real-time requirements