Awesome

GitHub projects from awesome lists

Search awesome repositories

Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.

Continue with GitHub Browse awesome lists Request a list

Repos indexed: 17,373
Awesome lists tracked: 125
Current results: 35

Find repositories

Start broad, then narrow by ecosystem, freshness, health, and growth.

Clear 1 refinement

Search repositories

Search mode

Keyword Semantic

Tune results

The controls most people need first.

Awesome list

Language

Freshness

Sort

Direction

More filters Topics, generated tags, stack, files, age, archive status, and growth.

Ecosystem

GitHub topic

Generated tag

Framework or stack

Package manager

Files

Has file

Choose a suggestion or use commas to require multiple files.

Health

Minimum stars

Repository age

Uses known first-commit dates.

Archive status

AI development signals

Momentum

Unmaintained for

Commit velocity

Star growth

Reset filters

35 repos shown

Topic: ocr

Browse

Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

opendatalab/MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Stack

Python FastAPI Gradio pytest PEP 517 pip

GitHub topics

#ai4science #document-analysis #docx #extract-data #layout-analysis #ocr

Updated: 2026-07-14
Lists: 1 list mention
First commit: 2024-02-29
History: 5 history points
License: NOASSERTION
Issues: 35 open

74,639

stars

Forks: 6,274
Commits: 5,712 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

hiroi-sora/Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

Stack

Python

GitHub topics

#ocr #ocr-python #paddleocr #qml #qt #screenshot

Updated: 2025-11-20
Lists: 1 list mention
First commit: 2023-05-17
History: 5 history points
License: MIT
Issues: 354 open

46,060

stars

Forks: 4,529
Commits: 1,267 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

Stack

Python Angular Celery Django Django REST Framework npm pnpm uv

GitHub topics

#angular #archiving #django #dms #document-management #document-management-system

Updated: 2026-07-17
Lists: 2 list mentions
First commit: 2015-12-20
History: 92 history points
License: GPL-3.0
Issues: 8 open

43,136

stars

Forks: 2,900
Commits: 11,606 commits
Star growth, last 7 days: +234 +0.5%
Commit velocity, last 7 days: +27 +0.2%

Website GitHub

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Stack

JavaScript npm

GitHub topics

#deep-learning #javascript #ocr #tesseract #webassembly

Updated: 2026-05-17
Lists: 1 list mention
First commit: 2015-06-26
History: 20 history points
License: Apache-2.0
Issues: 45 open

38,532

stars

Forks: 2,376
Commits: 846 commits
Star growth, last 7 days: +346 +0.9%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Stack

Python pytest Streamlit uv

GitHub topics

#image-processing #ocr #pdf #python #tesseract

Updated: 2026-07-03
Lists: 1 list mention
First commit: 2013-04-09
History: 2 history points
License: MPL-2.0
Issues: 95 open

34,127

stars

Forks: 2,357
Commits: 4,392 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Stack

Python pip

GitHub topics

#cnn #crnn #data-mining #deep-learning #easyocr #image-processing

Updated: 2025-12-05
Lists: 1 list mention
First commit: 2020-03-14
History: 7 history points
License: Apache-2.0
Issues: 528 open

29,723

stars

Forks: 3,585
Commits: 619 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

tisfeng/Easydict

一个简洁优雅的词典翻译 macOS App。开箱即用，支持离线 OCR 识别，支持有道词典，🍎 苹果系统词典，🍎 苹果系统翻译，OpenAI，Gemini，DeepL，Google，Bing，腾讯，百度，阿里，小牛，彩云和火山翻译。A concise and elegant Dictionary and Translator macOS App for looking up words and translating text.

AI dev

Stack

Swift Bundler CocoaPods npm

GitHub topics

#app #baidu #bing #deepl #dictionary #gemini

Updated: 2026-07-02
Lists: 4 list mentions
First commit: 2022-10-30
History: 7 history points
License: GPL-3.0
Issues: 151 open

13,740

stars

Forks: 693
Commits: 3,962 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

getomni-ai/zerox

OCR & Document Extraction using vision models

Stack

TypeScript Jupyter pytest Tornado npm pip Poetry

GitHub topics

#ocr #pdf

Updated: 2025-05-20
Lists: 1 list mention
First commit: 2024-07-21
History: 4 history points
License: MIT
Issues: 89 open

12,239

stars

Forks: 846
Commits: 450 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

dataelement/bisheng

BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.

Stack

TypeScript Celery FastAPI LangChain LlamaIndex npm PEP 517 uv

GitHub topics

#agent #ai #chatbot #enterprise #finetune #genai

Updated: 2026-07-03
Lists: 1 list mention
First commit: 2023-08-28
History: 4 history points
License: Apache-2.0
Issues: 113 open

11,490

stars

Forks: 1,876
Commits: 6,544 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

the-paperless-project/paperless

Scan, index, and archive all of your paper documents

Archived

Stack

Python Django Django REST Framework pytest pip Pipenv

GitHub topics

#archiving #documents #ocr #paper #search

Updated: 2021-04-06
Lists: 1 list mention
First commit: 2015-12-20
History: 2 history points
License: GPL-3.0
Issues: 177 open

7,913

stars

Forks: 499
Commits: 1,294 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

Stack

Python FastAPI Gradio Poetry

GitHub topics

#ingestion-api #ocr #omniparser #parse-server #parser-library #vision-transformer

Updated: 2025-12-12
Lists: 1 list mention
First commit: 2024-06-04
History: 5 history points
License: GPL-3.0
Issues: 74 open

7,631

stars

Forks: 651
Commits: 122 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

ciur/papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)

Stack

Python Celery Django Django REST Framework pip Poetry

GitHub topics

#archives #django #dms #document-management #ocr #paperless

Updated: 2025-11-23
Lists: 1 list mention
First commit: 2020-01-06
History: 5 history points
License: Apache-2.0
Issues: 132 open

2,932

stars

Forks: 308
Commits: 2,215 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

Sumanth077/Hands-On-AI-Engineering

A curated collection of practical AI projects implementing OCR systems, RAG, AI agents, and other AI use cases.

Stack

Python FastAPI Gradio LangChain Starlette PEP 517 pip uv

GitHub topics

#agents #ai #ai-agents #ai-engineering #generative-ai #llms

Updated: 2026-07-10
Lists: 0 list mentions
First commit: 2025-02-24
History: 26 history points
License: Unknown
Issues: 7 open

2,650

stars

Forks: 705
Commits: 556 commits
Star growth, last 7 days: +64 +2.5%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

sismics/docs

Lightweight document management system packed with all the features you can expect from big expensive solutions

Stack

JavaScript Android Gradle Maven npm

GitHub topics

#cloud #dms #docker #document #enterprise #file-sharing

Updated: 2026-02-09
Lists: 1 list mention
First commit: 2013-07-27
History: 4 history points
License: GPL-2.0
Issues: 112 open

2,552

stars

Forks: 672
Commits: 1,125 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

icereed/paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI

AI dev

Stack

Go Gin gRPC Go React Tailwind CSS Go modules npm

GitHub topics

#ai #chatgpt #llm #mistral #ocr #ollama

Updated: 2026-06-24
Lists: 2 list mentions
First commit: 2024-09-23
History: 4 history points
License: MIT
Issues: 204 open

2,469

stars

Forks: 177
Commits: 629 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

Achno/gowall

A tool to convert a Wallpaper's color scheme / palette, OCR with VLM's Traditional & Hybrid, Image Compression ,color palette extraction, image upsacling with Adversarial Networks and more image processing features.

Stack

Go Cobra gRPC Go Go modules

GitHub topics

#background-removal #cli #color-extractor #color-palette #color-scheme #compression

Updated: 2026-06-10
Lists: 2 list mentions
First commit: 2024-07-22
History: 6 history points
License: MIT
Issues: 13 open

2,281

stars

Forks: 38
Commits: 195 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

Stack

Elm Tailwind CSS npm Yarn

GitHub topics

#dms #docspell #document #document-management #document-management-system #edms

Updated: 2026-05-10
Lists: 2 list mentions
First commit: 2019-07-17
History: 6 history points
License: AGPL-3.0
Issues: 231 open

2,270

stars

Forks: 179
Commits: 5,590 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

amebalabs/TRex

Copy any text on your screen, stop retyping.

AI dev

Stack

Swift Swift Package Manager

GitHub topics

#macos #ocr #productivity #screenshot #swift #textrecognition

Updated: 2026-06-25
Lists: 1 list mention
First commit: 2021-02-19
History: 8 history points
License: MIT
Issues: 10 open

1,839

stars

Forks: 60
Commits: 159 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Stack

Python pytest Tornado pip Poetry

GitHub topics

#ai #document-image-analysis #document-intelligence #document-parsing #document-processing #langchain

Updated: 2025-08-27
Lists: 1 list mention
First commit: 2024-02-01
History: 4 history points
License: Apache-2.0
Issues: 34 open

1,578

stars

Forks: 155
Commits: 463 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

zelon88/HRConvert2

A self-hosted file conversion server & share tool that supports 445 file formats in 13 languages.

Stack

PHP

GitHub topics

#archiver #conversion #converter #document-conversion #extractor #file-converter

Updated: 2026-06-15
Lists: 1 list mention
First commit: 2018-02-24
History: 4 history points
License: GPL-3.0
Issues: 5 open

1,350

stars

Forks: 82
Commits: 818 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

lzhgus/Capso

Open-source screenshot and screen recording for macOS. The free, native alternative to CleanShot X. Built with Swift 6.0 and SwiftUI.

Stack

Swift Swift Package Manager

GitHub topics

#annotation #cleanshot-alternative #macos #ocr #open-source #screen-recording

Updated: 2026-06-24
Lists: 2 list mentions
First commit: 2026-04-10
History: 6 history points
License: NOASSERTION
Issues: 2 open

938

stars

Forks: 41
Commits: 255 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

CCExtractor/ccextractor

CCExtractor - Official version maintained by the core team

Stack

C Cargo CMake .NET SDK

GitHub topics

#c #cea-608 #cea-708 #dvb #hacktoberfest #hacktoberfest2021

Updated: 2026-06-23
Lists: 1 list mention
First commit: 2014-04-12
History: 2 history points
License: GPL-2.0
Issues: 32 open

889

stars

Forks: 576
Commits: 3,323 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

cosZone/MoePeek

A lightweight macOS selection translator built with pure Swift 6, featuring on-device Apple Translate for privacy, only 5MB install size and stable ~50MB memory usage. 一款轻量级 macOS 划词翻译工具，纯 Swift 6 开发，设备端 Apple 翻译保护隐私，安装体积仅 5MB，后台运行内存稳定约 50MB

AI dev

Stack

Swift Swift Package Manager

GitHub topics

#apple #macos #ocr #short #swift

Updated: 2026-05-26
Lists: 2 list mentions
First commit: 2026-02-16
History: 6 history points
License: AGPL-3.0
Issues: 9 open

682

stars

Forks: 40
Commits: 136 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

raphael-seo/Versatile-OCR-Program

Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)

Stack

Python

GitHub topics

#doclayout #educational-data #exam-ocr #machine-learning #ml-datasets #multi-modal

Updated: 2026-05-13
Lists: 0 list mentions
First commit: 2025-04-01
History: 25 history points
License: NOASSERTION
Issues: 0 open

677

stars

Forks: 50
Commits: 65 commits
Star growth, last 7 days: -2 -0.3%
Commit velocity, last 7 days: 0 0.0%

GitHub

junhoyeo/BetterOCR

🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.

Stack

Python pytest Poetry

GitHub topics

#ai #chatgpt #chatgpt-api #easyocr #llm #ocr

Updated: 2025-06-10
Lists: 1 list mention
First commit: 2023-10-26
History: 4 history points
License: MIT
Issues: 9 open

637

stars

Forks: 39
Commits: 94 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

papermerge/papermerge-core

Papermerge DMS core backend, REST API server, and frontend UI

Stack

Python Celery FastAPI pytest React npm uv Yarn

GitHub topics

#digital-archives #dms #document-management-system #documents #ocr #pdf

Updated: 2026-03-21
Lists: 1 list mention
First commit: 2020-12-23
History: 5 history points
License: Apache-2.0
Issues: 35 open

506

stars

Forks: 109
Commits: 1,971 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

mkucej/i-librarian-free

I, Librarian - open-source version of a PDF managing SaaS.

Stack

PHP

GitHub topics

#arxiv #crossref #document-management #doi #groupware #ieee

Updated: 2025-12-02
Lists: 1 list mention
First commit: 2020-04-07
History: 5 history points
License: GPL-3.0
Issues: 19 open

341

stars

Forks: 32
Commits: 315 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

stb-tester/stb-tester

Automated Testing for Set-Top Boxes and Smart TVs

Stack

Python PEP 517 pip

GitHub topics

#computer-vision #gstreamer #hdmi #hdmi-cec #lirc #numpy

Updated: 2026-05-20
Lists: 1 list mention
First commit: 2012-06-15
History: 2 history points
License: LGPL-2.1
Issues: 30 open

194

stars

Forks: 98
Commits: 3,787 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

alikia2x/openrewind

OpenRewind is a fully open-source, privacy-first alternative to rewind.ai. With OpenRewind, you can easily access your digital history, enhancing your memory and productivity without compromising your privacy.

Stack

TypeScript Electron React Tailwind CSS Vite Bun npm

GitHub topics

#alternative #memory-management #ocr #open-source-al #rewind-ai

Updated: 2025-01-31
Lists: 0 list mentions
First commit: 2024-06-03
History: 46 history points
License: GPL-3.0
Issues: 1 open

stars

Forks: 4
Commits: 120 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

lukebuild/LuxShot

Minimalist macOS OCR tool. Open-source, privacy-first, and built with SwiftUI.

Stack

Swift

GitHub topics

#macos #menu-bar-app #ocr #screenshot #swiftui #vision-framework

Updated: 2026-03-02
Lists: 2 list mentions
First commit: 2026-02-15
History: 6 history points
License: GPL-3.0
Issues: 0 open

stars

Forks: 4
Commits: 3 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

Search awesome repositories

Find repositories

Put your repository first

How it works

Pricing

How it works

Pricing