RUCAIBox/R1-Searcher
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
ToolRM: Towards Agentic Tool-Use Reward Modeling
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Democratizing Reinforcement Learning for LLMs
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
[AAAI 2026] AutoTool: Efficient Tool Selection for Large Language Model Agents
OpenClaw-RL: Train any agent simply by talking
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
2 captures since 2026-05-23