michaelfeil/infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A blazing fast inference solution for text embeddings models
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
All-in-one platform for search, recommendations, RAG, and analytics offered via API
A framework for few-shot evaluation of language models.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).
LLM inference in C/C++
1 capture since 2026-05-25