ychenNLP commited on
Commit
fe2d9e0
·
verified ·
1 Parent(s): 2149fa9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -21,8 +21,23 @@ tags:
21
 
22
  # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
23
 
 
 
 
 
 
 
 
 
24
  <img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
25
 
 
 
 
 
 
 
 
26
  We're thrilled to introduce AceReason-Nemotron-7B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% on AIME 2024 (+14.5%), 53.6% on AIME 2025 (+17.4%), 51.8% on LiveCodeBench v5 (+8%), 44.1% on LiveCodeBench v6 (+7%). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
27
 
28
  We share our training recipe, training logs in our technical report.
@@ -106,7 +121,10 @@ else:
106
  final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
107
  ```
108
  5. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
109
- 6. We use [AceMath scorer](https://huggingface.co/nvidia/AceMath-7B-Instruct/blob/main/evaluation/grader.py) for math evaluation and [LiveCodeBench official script](https://github.com/LiveCodeBench/LiveCodeBench) for code evaluation.
 
 
 
110
 
111
 
112
  ## Correspondence to
 
21
 
22
  # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
23
 
24
+ <p align="center">
25
+
26
+ [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
27
+ [![Dataset](https://img.shields.io/badge/🤗-Math_RL_Datset-blue)](https://huggingface.co/datasets/nvidia/AceReason-Math)
28
+ [![Models](https://img.shields.io/badge/🤗-Models-blue)](https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485)
29
+ [![Eval Toolkit](https://img.shields.io/badge/🤗-Eval_Code-blue)](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md)
30
+ </p>
31
+
32
  <img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
33
 
34
+ ## 🔥News
35
+ - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
36
+ - scripts to run inference and scoring
37
+ - LiveCodeBench (avg@8): model prediction files and scores for each month (2023/5-2025/5)
38
+ - AIME24/25 (avg@64): model prediction files and scores
39
+ - **6/2/2025**: We are excited to share our Math RL training dataset at [AceReason-Math](https://huggingface.co/datasets/nvidia/AceReason-Math)
40
+
41
  We're thrilled to introduce AceReason-Nemotron-7B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% on AIME 2024 (+14.5%), 53.6% on AIME 2025 (+17.4%), 51.8% on LiveCodeBench v5 (+8%), 44.1% on LiveCodeBench v6 (+7%). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
42
 
43
  We share our training recipe, training logs in our technical report.
 
121
  final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
122
  ```
123
  5. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
124
+
125
+ ## Evaluation Toolkit
126
+
127
+ Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
128
 
129
 
130
  ## Correspondence to