Job Openings AI Model Evaluation Specialist

About the job AI Model Evaluation Specialist

Key Responsibilities:

  • Perform scoring and qualitative evaluations of LLM-generated responses across multiple use cases.
  • Develop and maintain scoring guidelines and rubrics to ensure consistency and objectivity.
  • Collaborate with data scientists, product managers, and engineering teams to align scoring with project goals.
  • Assist in the creation and labeling of high-quality evaluation datasets for prompt tuning or model fine-tuning.
  • Utilize NLP-based metrics and tools (e.g., ROUGE, BLEU, cosine similarity) for automated scoring support.
  • Document scoring patterns, common model errors, and improvement opportunities.
  • Contribute to prompt experimentation and help compare effectiveness of different prompt strategies.

Qualifications:

  • Prior experience with LLMs (e.g., GPT, Claude, LLaMA, etc.) or AI/NLP projects is highly preferred.
  • Strong analytical skills and attention to detail, especially in assessing language quality.
  • Familiarity with prompt engineering, generative AI, or conversational AI tools is a plus.
  • Hands-on experience with Python, Jupyter, or evaluation libraries (optional but desirable).
  • Experience working with evaluation frameworks or annotation tools (Label Studio, Prodigy, etc.) is a bonus.
  • Excellent written and verbal communication skills