flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Fast and memory-efficient exact attention
FlashInfer: Kernel Library for LLM Serving
Fast GPT-2 inference written in Fortran
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
PyTorch native quantization and sparsity for training and inference
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
High-speed Large Language Model Serving for Local Deployment
1 capture since 2026-05-25
AI agent config detected
Key config paths
AGENTS.md
CLAUDE.md