huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
1 capture since 2026-05-25