AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
Abstract
This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license.
Community
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OpenCodeReasoning: Advancing Data Distillation for Competitive Coding (2025)
- PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models (2025)
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models (2025)
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models (2025)
- Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability (2025)
- Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models (2025)
- Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 6
Browse 6 models citing this paperDatasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper