Sign in
← Back to search

Ayanami0730/deep_research_bench

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Stars
735
Forks
79
Commits
44
Language
Python
Awesome lists
1

Similar repositories

openai/mle-bench

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1540 stars
Python 1 awesome list

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

3451 stars
Python 4 awesome lists

Alibaba-NLP/DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

18992 stars
Python 1 awesome list

claw-eval/claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

632 stars
Python 2 awesome lists

Tracked growth

2 captures since 2026-05-23

Latest capture 2026-05-31 03:02

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2025-06-13
  • First commit: 2025-06-13
  • Last pushed: 2026-05-11
  • Website: https://arxiv.org/pdf/2506.11763
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

No AI development config files detected.