Search-R1-v0.3 Collection RL with outcome reward + format reward. https://arxiv.org/abs/2505.15117 • 11 items • Updated 1 day ago • 1
Search-R1-v0.2 Collection Exploration with a more stable RL pipeline with outcome-only reward and scaled-up LLMs. https://arxiv.org/abs/2503.09516 • 25 items • Updated 1 day ago • 3
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 31