Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 8 days ago • 59
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published 15 days ago • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 21 days ago • 115
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 29 days ago • 84
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published 12 days ago • 71
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 19 days ago • 46
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 190
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 149
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 136
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 73
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 374
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 275
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 119
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 104
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 82
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 124
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 81
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 66
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 8 days ago • 209
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published 1 day ago • 26