meta-llama/llama
Inference code for Llama models
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
Inference code for Llama models
Utilities intended for use with Llama models.
Distribute and run LLMs with a single file.
LLM inference in C/C++
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
llama.cpp fork with additional SOTA quants and improved performance
1 capture since 2026-05-25