ServiceNow/BrowserGym
๐๐ช BrowserGym, a Gym environment for web task automation
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
๐๐ช BrowserGym, a Gym environment for web task automation
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
A verified version of the WebArena Benchmark
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
An agent benchmark with tasks in a simulated software company.
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
1 capture since 2026-05-25