open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
No description.
2 captures since 2026-05-25
AI agent config detected
Key config paths
.github/copilot-instructions.md
AGENTS.md
docs/en/get_started/supported_dataset/agent.md
docs/zh/get_started/supported_dataset/agent.md