← Back to search

github Active

Repository profile

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python Apache-2.0 main Stack scanned README.md

Open GitHub

Stars: 3,579
Forks: 270
Watchers: 26
Issues: 74
Commits: 75
Awesome lists: 4

Repository updates

Get generated THUDM/AgentBench development summaries by email, or follow the weekly and monthly RSS feeds.

Weekly RSS Monthly RSS

Activity and growth

Tracked growth, recent movement, and commit velocity from stored repository snapshots.

Latest capture 2026-07-16 03:06

Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%
Stars since baseline: +134
Snapshot coverage: 7

Tracked growth

7 captures since 2026-05-23

Stars from baseline +134

Time horizon

All tracked data

Custom start Custom end

Stars history

Total stars

Commits history

Default branch commits

Detected stack

Frameworks, package managers, ecosystems, and dependency manifests found during catalog scans.

Scanned 2026-07-16 03:06

Stack signals: 2
Package managers: 1
Manifest files: 4
Dependencies: 39

Frameworks and tools

FastAPI web framework · high confidence
Flask web framework · high confidence

pip python

Dependency files

4 manifests

requirements.txt python ecosystem, 18 dependencies
src/server/tasks/dbbench/requirements.txt python ecosystem, 1 dependency
src/server/tasks/knowledgegraph/requirements.txt python ecosystem, 2 dependencies
src/server/tasks/webshop/requirements.txt python ecosystem, 18 dependencies

Classification

Searchable topics, generated tags, and stack labels that explain where this repository fits.

Topics: 4
Tags: 0
Stacks: 2

Topics

#chatgpt #gpt-4 #llm #llm-agent

Generated tags

No generated tags yet.

Stack labels

FastAPI Flask

AI development signals

Agent instructions and tool configuration paths found in the repository tree.

0 paths

No AI development config files detected.

Similar repositories

Nearest indexed repositories by embedding similarity.

TheAgentCompany/TheAgentCompany

An agent benchmark with tasks in a simulated software company.

740 stars

Python 1 awesome list

KwaiKEG/KwaiAgents

A generalized information-seeking agent system with Large Language Models (LLMs).

1,200 stars

Python 1 awesome list

lmgame-org/GamingAgent

[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.

951 stars

Python 0 awesome lists

Ayanami0730/deep_research_bench

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

784 stars

Python 1 awesome list

harbor-framework/terminal-bench

A benchmark for LLMs on complicated tasks in the terminal

2,450 stars

Python 2 awesome lists

OSU-NLP-Group/Mind2Web-2

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

111 stars

Python 1 awesome list

Metadata

Language: Python
License: Apache-2.0
Default branch: main
Created: 2023-07-28
First commit: 2023-10-18
Last pushed: 2026-02-08
GitHub updated: 2026-07-15
Last synced: 2026-07-16 03:06
Stack detected: 2026-07-16 03:06
Archived: no

Links and files

GitHub README

THUDM/AgentBench

Activity and growth

Tracked growth

Time horizon

Stars history

Commits history

Detected stack

Frameworks and tools

Dependency files

Classification

Topics

Generated tags

Stack labels

AI development signals

Similar repositories

TheAgentCompany/TheAgentCompany

KwaiKEG/KwaiAgents

lmgame-org/GamingAgent

Ayanami0730/deep_research_bench

harbor-framework/terminal-bench

OSU-NLP-Group/Mind2Web-2

Metadata

Links and files

Appears in

How it works

Pricing

Follow repository updates

Activity and growth

Tracked growth

Time horizon

Stars history

Commits history

Detected stack

Frameworks and tools

Dependency files

Classification

Topics

Generated tags

Stack labels

AI development signals

Similar repositories

TheAgentCompany/TheAgentCompany

KwaiKEG/KwaiAgents

lmgame-org/GamingAgent

Ayanami0730/deep_research_bench

harbor-framework/terminal-bench

OSU-NLP-Group/Mind2Web-2

Metadata

Links and files

Appears in