awslabs/agent-evaluation
A generative AI-powered framework for testing virtual agents.
Repository profile
Deterministic execution for non-deterministic AI.
Tracked growth, recent movement, and commit velocity from stored repository snapshots.
Latest capture 2026-06-12 10:50
1 capture since 2026-06-12
Stars from baseline 0
Frameworks, package managers, ecosystems, and dependency manifests found during catalog scans.
Scanned 2026-06-12 10:50
Searchable topics, generated tags, and stack labels that explain where this repository fits.
Agent instructions and tool configuration paths found in the repository tree.
AI agent config detected
Key config paths
Nearest indexed repositories by embedding similarity.
A generative AI-powered framework for testing virtual agents.
Bring your own agent and build a self-improving agentic system. Automatically mine failures, optimize the agent harness, and gate against regressions.
Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.
🧠 Make your agents learn from experience. Now available as a hosted solution at kayba.ai
Collection of evals for Inspect AI
An agent benchmark with tasks in a simulated software company.