Sign in
Awesome

Awesome-list intelligence for GitHub

Search every repository hiding inside awesome lists.

Discover projects curated by awesome-list maintainers, then narrow them by stars, age, freshness, archive status, language, topics, generated tags, detected stacks, package managers, and source list.

Repos indexed
9,926
Awesome lists tracked
76
Current results
2
2 repos shown
Topic: nlp-machine-learning
Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

#chinese#chinese-language#chinese-nlp#chinese-simplified#corpus-data 2 awesome lists 300 commits first commit 2022-12-31 2 history points updated 2026-05-23