Sign in
Awesome

GitHub projects from awesome lists

Search awesome repositories

Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.

Repos indexed
9,945
Awesome lists tracked
76
Current results
2

Find repositories

Start broad, then narrow by ecosystem, freshness, health, and growth.

Clear 1 refinement
Search mode
Tune results
More filters Topics, generated tags, stack, age, archive status, and growth.
Ecosystem
Health

Uses known first-commit dates.

Momentum
Reset filters
Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

#chinese#chinese-language#chinese-nlp#chinese-simplified#corpus-data 2 awesome lists 300 commits first commit 2022-12-31 2 history points updated 2026-05-23