ggml-org/llama.cpp
LLM inference in C/C++
LLM inference in C/C++
A language for constraint-guided and efficient LLM programming.
Fast Multimodal LLM on Mobile Devices
Running large language models on a single GPU for throughput-oriented scenarios.
A fast inference library for running LLMs locally on modern consumer-class GPUs
A framework for few-shot evaluation of language models.
4 captures since 2026-05-22