confident-ai/deepeval
The LLM Evaluation Framework
Readymade evaluators for your LLM apps
The LLM Evaluation Framework
No description.
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
A framework for prompt tuning using Intent-based Prompt Calibration
1 capture since 2026-05-25