Job Openings
AI Model Evaluation Specialist
About the job AI Model Evaluation Specialist
Key Responsibilities:
- Perform scoring and qualitative evaluations of LLM-generated responses across multiple use cases.
- Develop and maintain scoring guidelines and rubrics to ensure consistency and objectivity.
- Collaborate with data scientists, product managers, and engineering teams to align scoring with project goals.
- Assist in the creation and labeling of high-quality evaluation datasets for prompt tuning or model fine-tuning.
- Utilize NLP-based metrics and tools (e.g., ROUGE, BLEU, cosine similarity) for automated scoring support.
- Document scoring patterns, common model errors, and improvement opportunities.
- Contribute to prompt experimentation and help compare effectiveness of different prompt strategies.
Qualifications:
- Prior experience with LLMs (e.g., GPT, Claude, LLaMA, etc.) or AI/NLP projects is highly preferred.
- Strong analytical skills and attention to detail, especially in assessing language quality.
- Familiarity with prompt engineering, generative AI, or conversational AI tools is a plus.
- Hands-on experience with Python, Jupyter, or evaluation libraries (optional but desirable).
- Experience working with evaluation frameworks or annotation tools (Label Studio, Prodigy, etc.) is a bonus.
- Excellent written and verbal communication skills