Sign in
← Back to search
Stars
501
Forks
72
Commits
1274
Language
Python
Awesome lists
2

Similar repositories

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

1749 stars
Python 2 awesome lists

LiveBench/LiveBench

LiveBench: A Challenging, Contamination-Free LLM Benchmark

1179 stars
Python 1 awesome list

google/BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

3242 stars
Python 2 awesome lists

openai/mle-bench

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1540 stars
Python 1 awesome list

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

4157 stars
Python 1 awesome list

Tracked growth

2 captures since 2026-05-27

Latest capture 2026-05-27 12:28

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2024-04-29
  • First commit: 2023-04-15
  • Last pushed: 2026-01-03
  • Website: https://bigcode-bench.github.io/
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

No AI development config files detected.