view article Article Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models By AI-MO and 17 others • 3 days ago • 33
Spurious Rewards Collection Spurious Rewards: Rethinking Training Signals in RLVR • 14 items • Updated 29 days ago • 2
One-Shot RLVR Collection Collections of models and papers for works: "Reinforcement Learning for Reasoning in Large Language Models with One Training Example" • 14 items • Updated 30 days ago • 1
One-Shot RLVR Collection Collections of models and papers for works: "Reinforcement Learning for Reasoning in Large Language Models with One Training Example" • 14 items • Updated 30 days ago • 1