Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
The Cloud Sandbox Built for AI Agents
Open-source, end-to-end platform for evaluating, observing, and improving LLM and AI agent applications. Tracing · Evals · Simulations · Datasets · Gateway · Guardrails. Self-hostable. Apache 2.0.
Open-source AI agent harness in native Rust — GUI, CLI, headless, and webapp from one binary. Multi-provider, MCP, skills, plugins, agent teams.
Reference code for the Meta-Harness paper.
Portable, vendor-agnostic agent harness for project-specific skills, workflows, and agent teams aligned with your codebase, conventions, and engineering standards.
OpenTelemetry Instrumentation for AI Observability
Evaluate and improve models and agents using environments
Tensorlake is a serverless runtime for sandboxes and deploying background agentic applications
The Agent Harness for AI-Human Collaboration, inspired by the AI-DLC (AI-Driven Development Lifecycle)
Minimal and readable coding agent harness implementation in Python to explain the core components of coding agents.
A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support for backtracking, a simple REST API and Python SDK, automatic port forwarding, and secure MicroVM isolation. Perfect for safely running, testing, and backtracking multi-step agent workflows.
Ultimate Context Engineering Infrastructure, starting from MCPs and Integrations
No description.
A production-ready runtime framework for agent apps with secure tool sandboxing, Agent-as-a-Service APIs, scalable deployment, full-stack observability, and broad framework compatibility.
An agent benchmark with tasks in a simulated software company.
CheetahClaws: A Fast and Easy-to-Use Agent Harness Infrastructure for Long-Horizon, Multi-Model, and Tool-Using AI Systems
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
🤖 MateClaw — Your second brain with Multi-Agent Orchestration, MCP Protocol, Skills & Memory, Dream, and Multi-Channel Support. Built on Spring AI Alibaba.
Collection of evals for Inspect AI
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
Bring your own agent and build a self-improving agentic system. Automatically mine failures, optimize the agent harness, and gate against regressions.
Batteries for your Pydantic AI agent.
Safe runtime for autonomous on-chain AI agents: isolated sandboxes, Library skills, encrypted secrets, and OKX read-only security checks.
Deterministic safety solutions for probabilistic AI agents
Open-source cross-agent memory layer for coding agents via MCP. Compatible with Cursor, Claude Code, Codex, Windsurf, Gemini CLI, GitHub Copilot, Kiro, OpenCode, Antigravity, and Trae.
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
A collection of Model Context Protocol (MCP) servers, clients and developer tools by IBM.
A generative AI-powered framework for testing virtual agents.
Self-hosted OpenClaw gateway + agent runtime in .NET (NativeAOT-friendly)