TheAgentCompany/TheAgentCompany
An agent benchmark with tasks in a simulated software company.
Evaluation harness for OpenHands V1.
An agent benchmark with tasks in a simulated software company.
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
SWE-bench: Can Language Models Resolve Real-world Github Issues?
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
1 capture since 2026-05-25
AI agent config detected
Key config paths
AGENTS.md