PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 1 day ago • 20
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 3 days ago • 30
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published 4 days ago • 16
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published 3 days ago • 42
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published 3 days ago • 42
CLS-RL: Image Classification with Rule-Based Reinforcement Learning Paper • 2503.16188 • Published 14 days ago • 9
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published 14 days ago • 27
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 16 days ago • 112
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published 18 days ago • 27
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published 22 days ago • 32
Self-Taught Self-Correction for Small Language Models Paper • 2503.08681 • Published 23 days ago • 13
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published 22 days ago • 27
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 21 days ago • 27
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 24 days ago • 40
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 24 days ago • 83
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 24 days ago • 55
Big-Math Collection This collection contains assets associated with the Big-Math dataset, a high-quality collection of over 250,000 math questions with verifiable answers • 3 items • Updated 28 days ago • 4