Sign in
← Back to search

InternLM/WildClawBench

An in-the-wild benchmark for AI agents in the OpenClaw Environment.

Stars
407
Forks
37
Commits
10
Language
Python
Awesome lists
1

Similar repositories

TIGER-AI-Lab/ClawBench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

347 stars
Python 1 awesome list

claw-eval/claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

632 stars
Python 2 awesome lists

pinchbench/skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

1196 stars
Python 1 awesome list

ZJU-REAL/ClawGUI

Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.

1272 stars
Python 1 awesome list

vivekchand/clawmetry

See your agent think. Real-time observability dashboard for OpenClaw AI agents.

342 stars
Python 1 awesome list

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 20:52

Stars history

Total stars

Commits history

Default branch commits

Metadata

AI development signals

No AI development config files detected.