huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A blazing fast inference solution for text embeddings models
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.
๐ก All-in-one AI framework for semantic search, LLM orchestration and language model workflows
State-of-the-art Machine Learning for the web. Run ๐ค Transformers directly in your browser, with no need for a server!
A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.
๐๏ธ Fine-tune, build, and deploy open-source LLMs easily!
1 capture since 2026-05-25