Sign in
← Back to search

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Stars
10,691
Forks
1,782
Commits
3776
Language
Python
Awesome lists
1

Similar repositories

triton-lang/triton

Development repository for the Triton language and compiler

19275 stars
MLIR 1 awesome list

OpenNMT/CTranslate2

Fast inference engine for Transformer models

4494 stars
C++ 1 awesome list

ai-dynamo/dynamo

A Datacenter Scale Distributed Inference Serving Framework

7073 stars
Rust 1 awesome list

NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

2759 stars
Python 1 awesome list

devflowinc/trieve

All-in-one platform for search, recommendations, RAG, and analytics offered via API

2662 stars
Rust 1 awesome list

NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

13725 stars
Python 2 awesome lists

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 21:20

Stars history

Total stars

Commits history

Default branch commits

Metadata

AI development signals

No AI development config files detected.