pytorch/ao
PyTorch native quantization and sparsity for training and inference
Accessible large language models via k-bit quantization for PyTorch.
PyTorch native quantization and sparsity for training and inference
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
QLoRA: Efficient Finetuning of Quantized LLMs
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
High-speed Large Language Model Serving for Local Deployment
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
1 capture since 2026-05-25
AI agent config detected
Key config paths
CLAUDE.md