huggingface/transformers
๐ค Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
๐ค Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Fast inference engine for Transformer models
๐ Accelerate inference and training of ๐ค Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Minimalistic large language model 3D-parallelism training
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
1 capture since 2026-05-27