RexX | Student Project Certifications India

Project Overview

Define test cases, score LLM outputs on accuracy, faithfulness, and tone. Build a regression tracker that alerts you when a prompt change breaks a passing test.

You will learn to:

Design a rigorous evaluation test suite with multiple scoring dimensions
Use an LLM-as-judge to automatically score other LLM outputs
Track performance across prompt versions and detect regressions
Calculate inter-rater agreement between automated and human evaluators
Produce structured failure analyses that guide prompt improvement

LLM Evaluation Framework

Project Overview

Technologies You'll Use

What's Included