THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A generalized information-seeking agent system with Large Language Models (LLMs).
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
A lightweight framework for building LLM-based agents
[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Tongyi Deep Research, the Leading Open-source Deep Research Agent
1 capture since 2026-05-27