ggml-org/llama.cpp
LLM inference in C/C++
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
LLM inference in C/C++
llama.cpp fork with additional SOTA quants and improved performance
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
llama.go is like llama.cpp in pure Golang!
Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
1 capture since 2026-05-25
AI agent config detected
Key config paths
.coderabbit.yaml
AGENTS.md
CLAUDE.md