Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,23 @@ tags:
|
|
21 |
|
22 |
# AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
We're thrilled to introduce AceReason-Nemotron-7B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% on AIME 2024 (+14.5%), 53.6% on AIME 2025 (+17.4%), 51.8% on LiveCodeBench v5 (+8%), 44.1% on LiveCodeBench v6 (+7%). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
|
27 |
|
28 |
We share our training recipe, training logs in our technical report.
|
@@ -106,7 +121,10 @@ else:
|
|
106 |
final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
|
107 |
```
|
108 |
5. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
|
109 |
-
|
|
|
|
|
|
|
110 |
|
111 |
|
112 |
## Correspondence to
|
|
|
21 |
|
22 |
# AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
|
23 |
|
24 |
+
<p align="center">
|
25 |
+
|
26 |
+
[](https://arxiv.org/abs/2505.16400)
|
27 |
+
[](https://huggingface.co/datasets/nvidia/AceReason-Math)
|
28 |
+
[](https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485)
|
29 |
+
[](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md)
|
30 |
+
</p>
|
31 |
+
|
32 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
|
33 |
|
34 |
+
## 🔥News
|
35 |
+
- **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
|
36 |
+
- scripts to run inference and scoring
|
37 |
+
- LiveCodeBench (avg@8): model prediction files and scores for each month (2023/5-2025/5)
|
38 |
+
- AIME24/25 (avg@64): model prediction files and scores
|
39 |
+
- **6/2/2025**: We are excited to share our Math RL training dataset at [AceReason-Math](https://huggingface.co/datasets/nvidia/AceReason-Math)
|
40 |
+
|
41 |
We're thrilled to introduce AceReason-Nemotron-7B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% on AIME 2024 (+14.5%), 53.6% on AIME 2025 (+17.4%), 51.8% on LiveCodeBench v5 (+8%), 44.1% on LiveCodeBench v6 (+7%). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
|
42 |
|
43 |
We share our training recipe, training logs in our technical report.
|
|
|
121 |
final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
|
122 |
```
|
123 |
5. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
|
124 |
+
|
125 |
+
## Evaluation Toolkit
|
126 |
+
|
127 |
+
Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
|
128 |
|
129 |
|
130 |
## Correspondence to
|