Search awesome repositories

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

#chinese #chinese-language #chinese-nlp #chinese-simplified #corpus-data 2 awesome lists 300 commits first commit 2022-12-31 2 history points updated 2026-05-23

★ 4,199

GitHub ↗

SylphAI-Inc/AdalFlow

AdalFlow: The library to build & auto-optimize LLM applications.

Python #agent #ai #auto-prompting #bm25 #chatbot 2 awesome lists 1935 commits 1 history point updated 2026-05-25

★ 4,151

Website ↗ GitHub ↗

ModelTC/LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python #deep-learning #gpt #llama #llm #model-serving 1 awesome list 844 commits 1 history point updated 2026-05-25

★ 4,077

GitHub ↗

miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

Python #html-extraction #html-extractor #html-page #lsa #nlp 1 awesome list 499 commits first commit 2013-02-20 4 history points updated 2026-03-31

★ 3,688

Website ↗ GitHub ↗

aurelio-labs/semantic-router

Superfast AI decision making and intelligent processing of multi-modal data.

Python pytest Tornado uv #ai #artificial-intelligence #chatbot #computer-vision #generative-ai AI dev signals 1 awesome list 2367 commits first commit 2023-10-30 2 history points updated 2026-05-23

★ 3,572

Website ↗ GitHub ↗

pytorch/text

Models, data loaders and abstractions for language processing, powered by PyTorch

Python #data-loader #dataset #deep-learning #models #nlp archived 1 awesome list 1313 commits first commit 2016-12-12 3 history points updated 2025-09-10

★ 3,560

Website ↗ GitHub ↗

ml-tooling/ml-workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

Jupyter Notebook #anaconda #data-analysis #data-science #data-visualization #deep-learning 3 awesome lists 847 commits 2 history points updated 2024-07-26

★ 3,538

Website ↗ GitHub ↗

QData/TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python #adversarial-attacks #adversarial-examples #adversarial-machine-learning #data-augmentation #machine-learning AI dev signals 1 awesome list 2792 commits 1 history point updated 2026-04-17

★ 3,426

Website ↗ GitHub ↗

guillaume-be/rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Rust #bart #bert #deep-learning #electra #gpt 3 awesome lists 1 history point updated 2026-01-13

★ 3,061

Website ↗ GitHub ↗

CVI-SZU/Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集

Python #bert #chatbot #chatgpt #chinese #chinese-nlp 2 awesome lists 125 commits first commit 2023-03-21 2 history points updated 2024-04-14

★ 3,052

GitHub ↗

TeamHG-Memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Jupyter Notebook #crfsuite #data-science #explanation #inspection #lightgbm AI dev signals 2 awesome lists 1268 commits 3 history points updated 2026-04-08

★ 2,775

Website ↗ GitHub ↗

eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

Elm #dms #docspell #document #document-management #document-management-system 2 awesome lists 5590 commits 2 history points updated 2026-05-10

★ 2,244

Website ↗ GitHub ↗

KudoAI/chatgpt.js

🤖 A powerful, open source JavaScript library for ChatGPT

JavaScript #ai #artificial-intelligence #bot #chat #chatbot 2 awesome lists 6303 commits first commit 2023-03-15 4 history points updated 2026-05-31

★ 2,037

GitHub ↗

shcherbak-ai/contextgem

ContextGem: Effortless LLM extraction from documents

Python #ai #contract-analysis #data-extraction #document-intelligence #docx AI dev signals 1 awesome list 195 commits 1 history point updated 2026-05-07

★ 1,844

Website ↗ GitHub ↗

enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Python #ai #document-image-analysis #document-intelligence #document-parsing #document-processing 1 awesome list 1 history point updated 2025-08-27

★ 1,547

Website ↗ GitHub ↗

web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Python #agent #nlp 2 awesome lists 203 commits first commit 2023-07-24 1 history point updated 2025-11-26

★ 1,489

Website ↗ GitHub ↗

explosion/spacy-transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Python #bert #google #gpt-2 #huggingface #language-model 1 awesome list 1 history point updated 2026-03-27

★ 1,406

Website ↗ GitHub ↗

Hello-SimpleAI/chatgpt-comparison-detection

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

Python #ai #chatbot #chatgpt #dataset #deep-learning 1 awesome list 39 commits first commit 2023-01-10 1 history point updated 2023-12-01

★ 1,354

Website ↗ GitHub ↗

unitaryai/detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

Python #bert #bert-model #hate-speech #hate-speech-detection #hatespeech 1 awesome list 267 commits 1 history point updated 2026-04-06

★ 1,246

Website ↗ GitHub ↗

datadreamer-dev/DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

Python #alignment #deep-learning #fine-tuning #gpt #instruction-tuning 1 awesome list 83 commits first commit 2024-02-01 1 history point updated 2025-02-02

★ 1,112

Website ↗ GitHub ↗

keras-team/keras-hub

Pretrained model hub for Keras 3.

Python #cv #deep-learning #jax #keras #llm AI dev signals 1 awesome list 1534 commits first commit 2020-09-12 3 history points updated 2026-05-28

★ 980

Website ↗ GitHub ↗

kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Python #c-transformers #chatgpt #cpu #cpu-inference #deep-learning 1 awesome list 1 history point updated 2023-11-06

★ 974

Website ↗ GitHub ↗

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

Jupyter Notebook #bert #chatgpt #chatgpt-api #dall-e #dall-e-api 2 awesome lists 634 commits first commit 2022-03-03 2 history points updated 2024-01-04

★ 962

Website ↗ GitHub ↗

cltk/cltk

The Classical Language Toolkit

Python #ai #greek #historical-linguistics #latin #ling 2 awesome lists 4032 commits first commit 2014-01-12 3 history points updated 2026-02-12

★ 907

Website ↗ GitHub ↗

openvenues/pypostal

Python bindings to libpostal for fast international address parsing/normalization

C #address #address-parser #binding #international #nlp 1 awesome list 114 commits first commit 2016-01-26 2 history points updated 2025-11-01

★ 879

GitHub ↗

The-FinAI/PIXIU

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

Jupyter Notebook #aifinance #chatgpt #fintech #gpt-4 #large-language-models 2 awesome lists 262 commits first commit 2023-06-02 1 history point updated 2025-03-04

★ 863

GitHub ↗

bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Python #code-generation #code-interpreter #humaneval #llm #nlp 1 awesome list 52 commits first commit 2024-05-13 1 history point updated 2024-07-06

★ 850

Website ↗ GitHub ↗

cocacola-lab/ChatIE

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

Python #ai #chatgpt #chatgpt-app #event-extraciton #event-extraction 2 awesome lists 92 commits first commit 2023-03-03 2 history points updated 2024-05-28

★ 826

Website ↗ GitHub ↗

Ayanami0730/deep_research_bench

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Python #agent #benchmark #deepresearch #nlp 1 awesome list 44 commits first commit 2025-06-13 2 history points updated 2026-05-11

★ 735

Website ↗ GitHub ↗

abadojack/whatlanggo

Natural language detection library for Go

Go #go #language #nlp #text-processing 1 awesome list 34 commits first commit 2017-02-28 2 history points updated 2023-03-28

★ 688

GitHub ↗

Search awesome repositories

Find repositories

Put your repository first

How it works

Pricing

How it works

Pricing