Sign in
← Back to search

OpenHands/benchmarks

Evaluation harness for OpenHands V1.

Stars
85
Forks
63
Commits
414
Language
Python
Awesome lists
1

Similar repositories

scaleapi/SWE-bench_Pro-os

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

398 stars
Python 1 awesome list

InternLM/WildClawBench

An in-the-wild benchmark for AI agents in the OpenClaw Environment.

407 stars
Python 1 awesome list

SWE-bench/SWE-bench

SWE-bench: Can Language Models Resolve Real-world Github Issues?

5010 stars
Python 2 awesome lists

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

3451 stars
Python 4 awesome lists

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 20:56

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2025-09-02
  • First commit: —
  • Last pushed: 2026-05-24
  • Archived: no
  • Stack detected: —
  • License: MIT

AI development signals

AI agent config detected

1 config path 1 file 0 directories
Agent instructions

Key config paths

  • file AGENTS.md