Sign in
← Back to search

datalab-to/marker

Convert PDF to markdown + JSON quickly with high accuracy

FastAPIJupyterpytestStreamlit
Stars
35,658
Forks
2,465
Commits
1359
Language
Python
Awesome lists
1

Similar repositories

allenai/olmocr

Toolkit for linearizing PDFs for LLM datasets/training

17353 stars
Python 1 awesome list

microsoft/markitdown

Python tool for converting files and office documents to Markdown.

125147 stars
Python 4 awesome lists

yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

786 stars
Rust 2 awesome lists

icereed/paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI

2378 stars
Go 2 awesome lists

harlan-zw/mdream

☁️ The fastest HTML to markdown convertor on GitHub. Optimized for LLMs and supports streaming.

897 stars
TypeScript 0 awesome lists

Tracked growth

2 captures since 2026-05-25

Latest capture 2026-06-02 07:35

Stars history

Total stars

Commits history

Default branch commits

Detected stack

Frameworks and tools

  • FastAPI · web framework · high confidence
  • Jupyter · notebook · high confidence
  • pytest · test framework · high confidence
  • Streamlit · app framework · high confidence
Poetry

Dependency files

  • pyproject.toml · python · 42 dependencies
  • poetry.lock · python · 0 dependencies

Metadata

  • Created: 2023-10-30
  • First commit: 2023-10-30
  • Last pushed: 2026-05-05
  • Website: https://www.datalab.to
  • Archived: no
  • Stack detected: 2026-06-02 07:35
  • License: GPL-3.0

AI development signals

No AI development config files detected.