Reasoning Models
updated
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
61
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
153
S*: Test Time Scaling for Code Generation
Paper
•
2502.14382
•
Published
•
63
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
113
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
•
2503.05379
•
Published
•
38
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
•
2503.05132
•
Published
•
57
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
46
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
61
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
•
2503.07536
•
Published
•
88
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
•
2503.12605
•
Published
•
35
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
144
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
•
2503.12797
•
Published
•
32
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Paper
•
2503.20641
•
Published
•
10
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
•
2503.24370
•
Published
•
19
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
42
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
•
2503.24376
•
Published
•
38
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
Ahead
Paper
•
2504.00294
•
Published
•
10
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
187
Learning What Reinforcement Learning Can't: Interleaved Online
Fine-Tuning for Hardest Questions
Paper
•
2506.07527
•
Published
•
3
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
•
2506.06941
•
Published
•
15
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Paper
•
2506.07976
•
Published
•
6
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
•
2506.18896
•
Published
•
29
Kwai Keye-VL Technical Report
Paper
•
2507.01949
•
Published
•
130
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
•
2506.18254
•
Published
•
31
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
•
2507.06448
•
Published
•
47
Test-Time Scaling with Reflective Generative Model
Paper
•
2507.01951
•
Published
•
107
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
•
2507.06261
•
Published
•
64
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
•
2507.05255
•
Published
•
74
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
•
2507.09477
•
Published
•
86
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper
•
2507.14843
•
Published
•
85
Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
316
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
•
2507.15758
•
Published
•
35
THU-KEG/LongWriter-Zero-32B
Text Generation
•
33B
•
Updated
•
38
•
•
110
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
•
2507.14958
•
Published
•
46
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
158
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
50
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
•
2508.10433
•
Published
•
144
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Paper
•
2510.08872
•
Published
•
3
RL makes MLLMs see better than SFT
Paper
•
2510.16333
•
Published
•
48
Scaling Latent Reasoning via Looped Language Models
Paper
•
2510.25741
•
Published
•
221
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
•
2510.25992
•
Published
•
45
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
•
2511.16334
•
Published
•
92
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
•
2512.23988
•
Published
•
12