evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
No description.
A framework for few-shot evaluation of language models.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
1 capture since 2026-05-25