ggml-org/llama.cpp
LLM inference in C/C++
🏗️ Fine-tune, build, and deploy open-source LLMs easily!
LLM inference in C/C++
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
1 capture since 2026-05-27