kvcache-ai/ktransformers
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Fast inference engine for Transformer models
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
Ongoing research training transformer models at scale
Faster Whisper transcription with CTranslate2
Fast GPT-2 inference written in Fortran
State-of-the-Art Embeddings, Retrieval, and Reranking
1 capture since 2026-05-25