Sign in
← Back to search

huggingface/datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Stars
3,066
Forks
266
Commits
725
Language
Python
Awesome lists
2

Similar repositories

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

1678 stars
Python 1 awesome list

RUC-NLPIR/FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

3494 stars
Python 1 awesome list

PKU-YuanGroup/Helios

Helios: Real Real-Time Long Video Generation Model

1850 stars
Python 1 awesome list

melih-unsal/DemoGPT

🤖 Create LLM agents in a second with your prompts. Everything you need to create an LLM Agent - tools, prompts, frameworks, and models - all in one place.

1897 stars
Python 3 awesome lists

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

3755 stars
Python 1 awesome list

Tracked growth

1 capture since 2026-05-25

Latest capture 2026-05-25 21:00

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2023-06-14
  • First commit: —
  • Last pushed: 2026-05-06
  • Archived: no
  • Stack detected: —
  • License: Apache-2.0

AI development signals

AI agent config detected

1 config path 1 file 0 directories
Agent instructions

Key config paths

  • file AGENTS.md