Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome-list intelligence for GitHub
Discover projects curated by awesome-list maintainers, then narrow them by stars, age, freshness, archive status, language, topics, generated tags, detected stacks, package managers, and source list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.
Coding Agent singularly focused efficiency and context curation. Reduces API costs by 50-80% vs other agent AND improves the code quality at the same time. Uses Hash Anchored edits, massively parallel operations, AST manipulation and many many other optimizations. https://dirac.run/
Code and Data for Tau-Bench
a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task
Give Claude Code a memory that evolves with your codebase. Hooks automatically capture sessions, the Claude Agent SDK extracts key decisions and lessons, and an LLM compiler organizes everything into structured, cross-referenced knowledge articles - inspired by Karpathy's LLM Knowledge Base architecture.
Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
Open-source AI agent harness in native Rust — GUI, CLI, headless, and webapp from one binary. Multi-provider, MCP, skills, plugins, agent teams.
Open-source, end-to-end platform for evaluating, observing, and improving LLM and AI agent applications. Tracing · Evals · Simulations · Datasets · Gateway · Guardrails. Self-hostable. Apache 2.0.
Reference code for the Meta-Harness paper.
The MCP server that turns Claude into the only coding agent hitting 100% on a real benchmark. -77% active tokens, -76% wall time, 0 losses across 96 tasks on Claude Opus 4.7. Structural code navigation + persistent memory. Works with every MCP client.
Minimal and readable coding agent harness implementation in Python to explain the core components of coding agents.
A production-ready runtime framework for agent apps with secure tool sandboxing, Agent-as-a-Service APIs, scalable deployment, full-stack observability, and broad framework compatibility.
Official, AWS-supported MCP servers, skills, and plugins to help AI agents build on AWS
Stash — persistent memory layer for AI agents. Episodes, facts, and working context stored in Postgres. MCP server included. Self-hosted, single binary, no cloud required.
Every practical and proposed defense against prompt injection.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Bring your own agent and build a self-improving agentic system. Automatically mine failures, optimize the agent harness, and gate against regressions.
Audit-grade multi-agent orchestration for CLI coding agents (Claude Code, Codex, Gemini CLI, +40 more). HMAC-chained audit log, signed agent cards, per-artefact lineage, air-gap deploy. The orchestrator your compliance team will sign off on. https://bernstein.run
Action-aware permissions for coding agents. A deterministic safety guard that keeps you in the flow.
State machine guardrails for AI agents
Browser Use Box: a 24/7 Claude Code agent for Playwright-style browser automation with Browser Use Cloud, Telegram, and a real browser on any box you own.
React components for visualizing traces from AI agents
MCP Fusion - The TypeScript framework for secure MCP servers.
tui-use lets agents interact with programs that expect a human at the keyboard — REPLs, debuggers, TUI apps, and anything else bash can't reach.
Demonstration of an agent harness with access to tools like Slack, GitHub, and Linear so it can act as your own complete software engineer.
Meta Harness Implementation
Continual harness optimization
swebench repro script for running confucius-code-agent (CCA)
Automated harness evolution for AI agents. A Claude Code plugin that iteratively optimizes system prompts, routing, retrieval, and orchestration code using full-trace counterfactual diagnosis. Based on Meta-Harness (Lee et al., 2026).
Agent debugging skill. Stop AI debugging guesswork with runtime evidence.