Job Openings
AI Automation Test Lead
About the job AI Automation Test Lead
We are seeking an AI Automation Test Lead to drive the quality strategy for our clients AI/LLM-powered products. You will build and scale test automation frameworks that validate not just functionality, but also model behavior, data quality, and non-deterministic outputs. This role combines deep test automation expertise with hands-on experience in AI/ML systems, leading a team to ensure our AI features are reliable, safe, and performant in production.
Key Responsibilities
1. Test Strategy & Leadership
- Define and own the end-to-end QA strategy for AI/LLM products including chatbots, agents, RAG pipelines, recommendation, and CV/NLP models
- Establish best practices for testing non-deterministic systems: prompt evaluation, hallucination detection, bias/safety testing, latency & cost regression
- Lead, mentor, and grow a team of SDETs/automation engineers. Set quality gates for CI/CD and release readiness
- Partner with Product, Data Science, and Engineering to shift-left quality and define acceptance criteria for AI features
2. AI Test Automation Architecture
- Architect automation frameworks for AI systems: prompt regression suites, golden dataset evaluation, synthetic data generation, LLM-as-judge pipelines
- Build tooling to test RAG quality: context relevance, grounding, citation accuracy, retrieval latency
- Automate testing of model APIs, vector DBs, embedding pipelines, and fine-tuning workflows
- Implement eval harnesses using frameworks like DeepEval, RAGAS, LangSmith, Promptfoo, or custom solutions
3. Data & Model Quality
- Design tests for data pipelines feeding AI: schema validation, drift detection, feature consistency between training/serving
- Own offline/online eval pipelines. Track metrics: accuracy, faithfulness, toxicity, P50/P95 latency, token cost
- Build canary & shadow testing for model deployments. Define rollback criteria based on guardrail violations
4. Traditional + AI System Testing
- Drive API, UI, and integration test automation for services hosting AI models
- Performance, load, and chaos testing for LLM inference endpoints and real-time features
- Security testing for prompt injection, jailbreak, data leakage, and PII handling
5. Governance & Reporting
- Create quality dashboards: model eval trends, defect leakage, flaky rate, coverage for AI scenarios
- Drive root cause analysis for AI incidents. Feed learnings back into dataset curation and test design
- Ensure compliance with AI safety, privacy, and regulatory requirements
Required Qualifications
- 8+ years in software QA/test automation, with 2+ years leading teams
- 3+ years hands-on testing AI/ML systems, LLMs, or data-intensive platforms
- Strong coding in Python for test framework development. Java/Go is a plus
- Experience with test automation: Pytest, Playwright, Selenium, REST/GraphQL, CI/CD with GitHub Actions, Jenkins
- Deep understanding of LLM/RAG concepts: prompts, embeddings, vector DBs, chunking, eval metrics
- Hands-on with Flink/Spark, SQL, Hive for validating data pipelines
- Experience with cloud + K8s: AWS/GCP, Docker, Kubernetes, model serving on GPU/CPU
- Built eval pipelines using LangSmith, Langfuse, Weights & Biases, MLflow, or similar
- Strong grasp of statistics for A/B testing, significance, and measuring non-deterministic systems
Preferred Qualifications
- Prior experience testing multi-agent systems, tool use, function calling
- Knowledge of red-teaming, AI safety evals, bias/fairness testing
- Contributions to open-source AI eval or testing frameworks
- Experience with Doris, ClickHouse, Elasticsearch, Druid for test data analysis
- Background in FinTech, E-commerce, or Search domains with real-time requirements