Sign in
← Back to search

mit-han-lab/duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Stars
541
Forks
41
Commits
20
Language
Python
Awesome lists
0

Similar repositories

OptimalScale/LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

8487 stars
Python 2 awesome lists

Tiiny-AI/PowerInfer

High-speed Large Language Model Serving for Local Deployment

9527 stars
C++ 1 awesome list

jianzhnie/LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

620 stars
Python 1 awesome list

pytorch/ao

PyTorch native quantization and sparsity for training and inference

2844 stars
Python 1 awesome list

Tracked growth

5 captures since 2026-06-04

Latest capture 2026-06-05 04:06

Stars history

Total stars

Commits history

Default branch commits

Detected stack

Frameworks and tools

  • No framework dependencies detected.
PEP 517 pip

Dependency files

  • pyproject.toml · python · 2 dependencies
  • setup.py · python · 0 dependencies

Metadata

AI development signals

No AI development config files detected.

Appears in

  • No awesome list links recorded.