vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast Multimodal LLM on Mobile Devices
Achieve state of the art inference performance with modern accelerators on Kubernetes
Universal LLM Deployment Engine with ML Compilation
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
1 capture since 2026-05-25
AI agent config detected
Key config paths
.claude
CLAUDE.md
.claude
.claude/skills
.claude/skills/docker-build
.claude/skills/docker-build/SKILL.md
.claude/skills/support-new-model
.claude/skills/support-new-model/SKILL.md
CLAUDE.md