re paper - a that113 Collection

that113 's Collections

d

re paper

updated 1 day ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published 22 days ago • 151
Group Sequence Policy Optimization

Paper • 2507.18071 • Published 8 days ago • 252
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published 14 days ago • 19
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published 5 days ago • 10
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

Paper • 2507.20527 • Published 4 days ago • 4
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Paper • 2507.16806 • Published 10 days ago • 5
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published 3 days ago • 5
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published 4 days ago • 26
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published 4 days ago • 68
L0: Reinforcement Learning to Become General Agents

Paper • 2506.23667 • Published Jun 30
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published 11 days ago • 19
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24 • 13
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

Paper • 2506.04185 • Published Jun 4
TreeRPO: Tree Relative Policy Optimization

Paper • 2506.05183 • Published Jun 5
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

Paper • 2506.11902 • Published Jun 13
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

Paper • 2410.12934 • Published Oct 16, 2024 • 1
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84