TheAgentCompany/TheAgentCompany
An agent benchmark with tasks in a simulated software company.
The Abstraction and Reasoning Corpus
An agent benchmark with tasks in a simulated software company.
A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with each task costs less than $0.7.
This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.
The first open-source harness builder for AI coding. Make AI coding deterministic and repeatable.
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
RAG evaluation without the need for "golden answers"
1 capture since 2026-05-27