Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Paper • 2605.06241 • Published • 4
None defined yet.
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?