llm-d/llm-d
Achieve state of the art inference performance with modern accelerators on Kubernetes
A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.
Achieve state of the art inference performance with modern accelerators on Kubernetes
A high-throughput and memory-efficient inference and serving engine for LLMs
Low-code framework for building custom LLMs, neural networks, and other AI models
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. ๐๐ป Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
A powerful AI framework with structured Pydantic responses, flexible LLM integration, and advanced agent capabilities
1 capture since 2026-05-25