Sign in
← Back to search

OSU-NLP-Group/Mind2Web-2

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

Stars
111
Forks
7
Commits
82
Language
Python
Awesome lists
1

Similar repositories

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

3451 stars
Python 4 awesome lists

web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

1489 stars
Python 2 awesome lists

LiveBench/LiveBench

LiveBench: A Challenging, Contamination-Free LLM Benchmark

1179 stars
Python 1 awesome list

hkust-nlp/WebExplorer

The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"

118 stars
Python 1 awesome list

Tracked growth

2 captures since 2026-05-23

Latest capture 2026-05-31 03:03

Stars history

Total stars

Commits history

Default branch commits

Metadata

AI development signals

AI agent config detected

7 config paths 7 files 0 directories
Claude Code 7

Key config paths

  • file cache_manager_web/CLAUDE.md
  • file CLAUDE.md
  • file mind2web2/api_tools/CLAUDE.md
  • file mind2web2/CLAUDE.md
  • file mind2web2/llm_client/CLAUDE.md
  • file mind2web2/prompts/CLAUDE.md

1 more config path detected.

Review config paths
  • Claude Code cache_manager_web/CLAUDE.md
  • Claude Code CLAUDE.md
  • Claude Code mind2web2/api_tools/CLAUDE.md
  • Claude Code mind2web2/CLAUDE.md
  • Claude Code mind2web2/llm_client/CLAUDE.md
  • Claude Code mind2web2/prompts/CLAUDE.md
  • Claude Code mind2web2/utils/CLAUDE.md