raphael-seo/Versatile-OCR-Program
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Repository profile
Python-based tools for document analysis and OCR
Repository updates
Get generated ocropus-archive/DUP-ocropy development summaries by email, or follow the weekly and monthly RSS feeds.
Sign in to subscribe by email. RSS feeds are public.
Sign in to subscribeTracked growth, recent movement, and commit velocity from stored repository snapshots.
Latest capture 2026-06-24 13:22
1 capture since 2026-06-24
Stars from baseline 0
All tracked data
Frameworks, package managers, ecosystems, and dependency manifests found during catalog scans.
Scanned 2026-06-24 13:22
requirements.txt
python ecosystem,
5 dependencies
setup.py
python ecosystem,
0 dependencies
Searchable topics, generated tags, and stack labels that explain where this repository fits.
Agent instructions and tool configuration paths found in the repository tree.
Nearest indexed repositories by embedding similarity.
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Toolkit for linearizing PDFs for LLM datasets/training
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
The ultimate training toolkit for finetuning diffusion models
An open source implementation of CLIP.