Sign in
← Back to search
Stars
347
Forks
21
Commits
310
Language
Python
Awesome lists
1

Similar repositories

InternLM/WildClawBench

An in-the-wild benchmark for AI agents in the OpenClaw Environment.

407 stars
Python 1 awesome list

pinchbench/skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

1196 stars
Python 1 awesome list

claw-eval/claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

632 stars
Python 2 awesome lists

hidai25/eval-view

Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.

112 stars
Python 1 awesome list

scaleapi/SWE-bench_Pro-os

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

398 stars
Python 1 awesome list

Tracked growth

1 capture since 2026-05-30

Latest capture 2026-05-30 10:50

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2026-04-10
  • First commit: 2026-04-10
  • Last pushed: 2026-05-25
  • Website: https://claw-bench.com
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

AI agent config detected

1 config path 1 file 0 directories
Agent instructions

Key config paths

  • file AGENTS.md