hkust-nlp/simpleRL-reason
Simple RL training for reasoning
Minimal reproduction of DeepSeek R1-Zero
Simple RL training for reasoning
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
No description.
Dr. Zero Self-Evolving Search Agents without Training Data
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Scaling Deep Research via Reinforcement Learning in Real-world Environments.
1 capture since 2026-05-25