Sign in
← Back to search

centerforaisafety/hle

Humanity's Last Exam

Stars
1,561
Forks
102
Commits
28
Language
Python
Awesome lists
1

Similar repositories

LiveBench/LiveBench

LiveBench: A Challenging, Contamination-Free LLM Benchmark

1179 stars
Python 1 awesome list

openai/mle-bench

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1540 stars
Python 1 awesome list

stanford-crfm/helm

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.

2798 stars
Python 2 awesome lists

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

2428 stars
Python 2 awesome lists

Tracked growth

2 captures since 2026-05-23

Latest capture 2026-05-31 03:02

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2025-01-23
  • First commit: 2025-01-23
  • Last pushed: 2026-02-20
  • Website: https://lastexam.ai
  • Archived: no
  • Stack detected: —
  • License: MIT

AI development signals

No AI development config files detected.