raphael-seo/Versatile-OCR-Program
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Repository profile
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Repository updates
Get generated ocrmypdf/OCRmyPDF development summaries by email, or follow the weekly and monthly RSS feeds.
Sign in to subscribe by email. RSS feeds are public.
Sign in to subscribeTracked growth, recent movement, and commit velocity from stored repository snapshots.
Latest capture 2026-06-24 13:17
1 capture since 2026-06-24
Stars from baseline 0
All tracked data
Frameworks, package managers, ecosystems, and dependency manifests found during catalog scans.
Scanned 2026-06-24 13:17
pyproject.toml
python ecosystem,
38 dependencies
uv.lock
python ecosystem,
0 dependencies
Searchable topics, generated tags, and stack labels that explain where this repository fits.
Agent instructions and tool configuration paths found in the repository tree.
Nearest indexed repositories by embedding similarity.
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.
A Python library for reading and writing PDF, powered by QPDF
Write-capable PDF toolkit for any MCP client: 22 tools to read, create, render, encrypt, and transform PDFs. Vision rendering for scans, form-preserving merge and split, AES-256, zero native dependencies.
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI