AI Model Evaluation Specialist

New York, New York, United States

Job Openings AI Model Evaluation Specialist

Key Responsibilities:

Perform scoring and qualitative evaluations of LLM-generated responses across multiple use cases.
Develop and maintain scoring guidelines and rubrics to ensure consistency and objectivity.
Collaborate with data scientists, product managers, and engineering teams to align scoring with project goals.
Assist in the creation and labeling of high-quality evaluation datasets for prompt tuning or model fine-tuning.
Utilize NLP-based metrics and tools (e.g., ROUGE, BLEU, cosine similarity) for automated scoring support.
Document scoring patterns, common model errors, and improvement opportunities.
Contribute to prompt experimentation and help compare effectiveness of different prompt strategies.

Qualifications:

Prior experience with LLMs (e.g., GPT, Claude, LLaMA, etc.) or AI/NLP projects is highly preferred.
Strong analytical skills and attention to detail, especially in assessing language quality.
Familiarity with prompt engineering, generative AI, or conversational AI tools is a plus.
Hands-on experience with Python, Jupyter, or evaluation libraries (optional but desirable).
Experience working with evaluation frameworks or annotation tools (Label Studio, Prodigy, etc.) is a bonus.
Excellent written and verbal communication skills

Or refer someone