Sign in
← Back to search

openai/evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Stars
18,531
Forks
2,967
Commits
691
Language
Python
Awesome lists
4

Similar repositories

evidentlyai/evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

7534 stars
Jupyter Notebook 4 awesome lists

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

2428 stars
Python 2 awesome lists

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

370 stars
Python 1 awesome list

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

1749 stars
Python 2 awesome lists

Tracked growth

2 captures since 2026-05-25

Latest capture 2026-05-25 21:12

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2023-01-23
  • First commit: —
  • Last pushed: 2026-04-14
  • Archived: no
  • Stack detected: —
  • License: NOASSERTION

AI development signals

No AI development config files detected.