CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published 14 days ago • 19
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published 5 days ago • 10
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers Paper • 2507.20527 • Published 4 days ago • 4
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Paper • 2507.16806 • Published 10 days ago • 5
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity Paper • 2507.21848 • Published 3 days ago • 5
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published 4 days ago • 68
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published 11 days ago • 19
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published Jun 24 • 13
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Paper • 2506.04185 • Published Jun 4
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction Paper • 2410.12934 • Published Oct 16, 2024 • 1
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 84