Sign in
← Back to search

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Stars
3,451
Forks
256
Commits
75
Language
Python
Awesome lists
4

Similar repositories

KwaiKEG/KwaiAgents

A generalized information-seeking agent system with Large Language Models (LLMs).

1199 stars
Python 1 awesome list

lmgame-org/GamingAgent

[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.

932 stars
Python 0 awesome lists

OSU-NLP-Group/Mind2Web-2

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

111 stars
Python 1 awesome list

Tracked growth

3 captures since 2026-05-23

Latest capture 2026-05-25 21:19

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2023-07-28
  • First commit: —
  • Last pushed: 2026-02-08
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

No AI development config files detected.