huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
A blazing fast inference solution for text embeddings models
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).
A dead-simple API to build LLM-powered apps
Fast GPT-2 inference written in Fortran
Postgres with GPUs for ML/AI apps.
1 capture since 2026-05-25