huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
The LLM Evaluation Framework
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.
DeepTeam is a framework to red team LLMs and LLM systems.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
6 captures since 2026-05-23
pyproject.toml
· python · 57 dependencies
poetry.lock
· python · 0 dependencies
docs/package.json
· javascript · 26 dependencies
docs/yarn.lock
· javascript · 512 dependencies