Sign in
← Back to search

stanford-crfm/helm

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.

Stars
2,798
Forks
392
Commits
6281
Language
Python
Awesome lists
2

Similar repositories

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

4157 stars
Python 1 awesome list

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

1749 stars
Python 2 awesome lists

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

2428 stars
Python 2 awesome lists

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

4151 stars
Python 2 awesome lists

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 21:18

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2021-11-29
  • First commit: —
  • Last pushed: 2026-05-20
  • Website: https://crfm.stanford.edu/helm
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

No AI development config files detected.