openai/simple-evals
No description.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
No description.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
RAG evaluation without the need for "golden answers"
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
The LLM Evaluation Framework
2 captures since 2026-05-25