vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
Simple RL training for reasoning
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
1 capture since 2026-05-25