ServiceNow/WorkArena
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
A verified version of the WebArena Benchmark
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
๐๐ช BrowserGym, a Gym environment for web task automation
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
No description.
1 capture since 2026-05-25