allenai/olmocr
Toolkit for linearizing PDFs for LLM datasets/training
Convert PDF to markdown + JSON quickly with high accuracy
Toolkit for linearizing PDFs for LLM datasets/training
Python tool for converting files and office documents to Markdown.
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI
☁️ The fastest HTML to markdown convertor on GitHub. Optimized for LLMs and supports streaming.
Get your documents ready for gen AI
2 captures since 2026-05-25
pyproject.toml
· python · 42 dependencies
poetry.lock
· python · 0 dependencies