haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
LAVIS - A One-stop Library for Language-Vision Intelligence
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
1 capture since 2026-05-25