ggml-org/llama.cpp
LLM inference in C/C++
A fast inference library for running LLMs locally on modern consumer-class GPUs
LLM inference in C/C++
Inference code for Llama models
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
llama.cpp fork with additional SOTA quants and improved performance
1 capture since 2026-05-25