Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
An awesome list of Agent Harness engineering resources, including GitHub projects, tools, benchmarks, and practical guides.
GitHub stars and default-branch commits for Picrew/awesome-agent-harness.
227 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
AI Agent Builder and Runtime by Docker Engineering
Agent Orchestration Command Center
Laminar - open-source observability platform purpose-built for AI agents. YC S24.
Amazon Bedrock Agentcore accelerates AI agents into production with the scale, reliability, and security, critical to real-world deployment.
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
A framework-agnostic, git-native standard for defining AI agents
Headless CLI client for stateful Agent Client Protocol (ACP) sessions
The CLI and skills that turn any coding assistant into an expert at creating, evaluating, and deploying AI agents on Google Cloud.
AI-Driven Life Cycle (AI-DLC) adaptive workflow steering rules for AI coding agents
A Tool to Visualize Claude Code's LLM Interactions
agent-sandbox enables easy management of isolated, stateful, singleton workloads, ideal for use cases like AI agent runtimes.
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
A benchmark for LLMs on complicated tasks in the terminal
AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
Harbor is a framework for running agent evaluations and creating and using RL environments.
One brain, many harnesses. Portable .agent/ folder (memory + skills + protocols) that plugs into Claude Code, Cursor, Windsurf, OpenCode, OpenClaw, Hermes, or DIY Python — and keeps its knowledge when you switch.
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
Official Microsoft Learn MCP Server and CLI tool – powering LLMs and AI agents with real-time, trusted Microsoft docs & code samples.
No description.
Ship your code, on autopilot. An open source agent that lives on your machines 24/7 and keeps your apps running. 🦀
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
A HTTP credential proxy and vault for AI agents like Claude Code, OpenClaw, Hermes, custom agents + harnesses, and more.
Open-source AI agent desktop app for Windows & macOS. One-click install Claude Code, MCP tools, and Skills — with sandbox isolation, multi-model support, and Feishu/Slack integration.
Run Coding Agents in Sandboxes. Control Them Over HTTP. Supports Claude Code, Codex, OpenCode, and Amp.
LLM powered fuzzing via OSS-Fuzz.
E2B Desktop Sandbox for LLMs. E2B Sandbox with desktop graphical environment that you can connect to any LLM for secure computer use.
Engineering decisions engine that know when they're stale. Frame, compare, decide — with evidence decay and parity enforcement. For Claude Code, Cursor, Gemini CLI, Codex and more.
A Kubernetes-native control plane for AI agent instance management, with governed AI access, runtime orchestration, and reusable resources across multiple agent runtimes.
The batteries included agent harness.
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
The Cloud Sandbox Built for AI Agents
Open-source, end-to-end platform for evaluating, observing, and improving LLM and AI agent applications. Tracing · Evals · Simulations · Datasets · Gateway · Guardrails. Self-hostable. Apache 2.0.
Open-source AI agent harness in native Rust — GUI, CLI, headless, and webapp from one binary. Multi-provider, MCP, skills, plugins, agent teams.
Portable, vendor-agnostic agent harness for project-specific skills, workflows, and agent teams aligned with your codebase, conventions, and engineering standards.
OpenTelemetry Instrumentation for AI Observability
Reference code for the Meta-Harness paper.
Tensorlake is a serverless runtime for sandboxes and deploying background agentic applications
Evaluate and improve models and agents using environments
The Agent Harness for AI-Human Collaboration, inspired by the AI-DLC (AI-Driven Development Lifecycle)
Minimal and readable coding agent harness implementation in Python to explain the core components of coding agents.
A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support for backtracking, a simple REST API and Python SDK, automatic port forwarding, and secure MicroVM isolation. Perfect for safely running, testing, and backtracking multi-step agent workflows.
Ultimate Context Engineering Infrastructure, starting from MCPs and Integrations
No description.
A production-ready runtime framework for agent apps with secure tool sandboxing, Agent-as-a-Service APIs, scalable deployment, full-stack observability, and broad framework compatibility.
CheetahClaws: A Fast and Easy-to-Use Agent Harness Infrastructure for Long-Horizon, Multi-Model, and Tool-Using AI Systems
An agent benchmark with tasks in a simulated software company.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
🤖 MateClaw — Your second brain with Multi-Agent Orchestration, MCP Protocol, Skills & Memory, Dream, and Multi-Channel Support. Built on Spring AI Alibaba.
Collection of evals for Inspect AI
Bring your own agent and build a self-improving agentic system. Automatically mine failures, optimize the agent harness, and gate against regressions.