Sign in
← Back to search

scaleapi/SWE-bench_Pro-os

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Stars
398
Forks
66
Commits
75
Language
Python
Awesome lists
1

Similar repositories

SWE-bench/SWE-bench

SWE-bench: Can Language Models Resolve Real-world Github Issues?

5010 stars
Python 2 awesome lists

TIGER-AI-Lab/ClawBench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

347 stars
Python 1 awesome list

openai/mle-bench

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1540 stars
Python 1 awesome list

InternLM/WildClawBench

An in-the-wild benchmark for AI agents in the OpenClaw Environment.

407 stars
Python 1 awesome list

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 20:57

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2025-09-05
  • First commit: —
  • Last pushed: 2026-05-18
  • Archived: no
  • Stack detected: —
  • License: MIT

AI development signals

No AI development config files detected.