Sign in
← Back to search

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Stars
2,844
Forks
340
Commits
777
Language
Python
Awesome lists
2

Similar repositories

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

7025 stars
Python 3 awesome lists

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

4151 stars
Python 2 awesome lists

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

1749 stars
Python 2 awesome lists

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

2428 stars
Python 2 awesome lists

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

4157 stars
Python 1 awesome list

Tracked growth

2 captures since 2026-05-25

Latest capture 2026-05-25 21:09

Stars history

Total stars

Commits history

Default branch commits

Metadata

AI development signals

AI agent config detected

4 config paths 4 files 0 directories
Agent instructions 3 GitHub Copilot

Key config paths

  • file .github/copilot-instructions.md
  • file AGENTS.md
  • file docs/en/get_started/supported_dataset/agent.md
  • file docs/zh/get_started/supported_dataset/agent.md