Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
AdalFlow: The library to build & auto-optimize LLM applications.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Module for automatic summarization of text documents and HTML pages.
Superfast AI decision making and intelligent processing of multi-modal data.
Models, data loaders and abstractions for language processing, powered by PyTorch
🛠 All-in-one web-based IDE specialized for machine learning and data science.
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
🤖 A powerful, open source JavaScript library for ChatGPT
ContextGem: Effortless LLM extraction from documents
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Pretrained model hub for Keras 3.
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more
Python bindings to libpostal for fast international address parsing/normalization
This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Natural language detection library for Go