huggingface/trl
Train transformer language models with reinforcement learning.
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Train transformer language models with reinforcement learning.
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Reinforcement Learning in PyTorch
Go ahead and axolotl questions
Democratizing Reinforcement Learning for LLMs
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
1 capture since 2026-05-27