Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
An awesome list of Agent Harness engineering resources, including GitHub projects, tools, benchmarks, and practical guides.
GitHub stars and default-branch commits for Picrew/awesome-agent-harness.
227 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
Batteries for your Pydantic AI agent.
Deterministic safety solutions for probabilistic AI agents
Open-source cross-agent memory layer for coding agents via MCP. Compatible with Cursor, Claude Code, Codex, Windsurf, Gemini CLI, GitHub Copilot, Kiro, OpenCode, Antigravity, and Trae.
Safe runtime for autonomous on-chain AI agents: isolated sandboxes, Library skills, encrypted secrets, and OKX read-only security checks.
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
A collection of Model Context Protocol (MCP) servers, clients and developer tools by IBM.
A generative AI-powered framework for testing virtual agents.
Self-hosted OpenClaw gateway + agent runtime in .NET (NativeAOT-friendly)
Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.
Secure runtime to sandbox AI agent tasks. Run untrusted code in isolated WebAssembly environments.
The production-ready agent harness framework for Python
🛡️Decision infrastructure for AI agents. Intercept actions, enforce guard policies, require approvals, and produce audit-ready decision trails.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Open Python agent harness for production AI apps: tools, MCP, memory, workspace, telemetry, subagents, background tasks, and OmniServe APIs.
Secure local dev environment for collaboration with AI coding agents
The harness layer for Claude Code — a reference implementation of harness engineering with hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts. Quality gates that AI can't skip.
Runtime for long-horizon agents
HexAgent – An Agent harness that gives any LLM a computer to complete tasks the way humans do
Universally Triggered Agent Harness - An OpenClaw-like Inngest-powered personal agent
Tandem is the authority layer for AI-first work: runtime authority for agents, tools, memory, approvals, and audit trails.
Evaluation harness for OpenHands V1.
This repository defines AGENT.md, a standardized format that lets your codebase speak directly to any agentic coding tool.
A verified version of the WebArena Benchmark
No description.