Add link to paper and Github repo

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +40 -29
README.md CHANGED
@@ -1,26 +1,24 @@
1
-
2
-
3
-
4
  ---
 
 
5
  library_name: transformers
6
  license: other
7
  license_name: nvidia-open-model-license
8
- license_link: >-
9
- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
10
  pipeline_tag: text-generation
11
- language:
12
- - en
13
  tags:
14
- - nvidia
15
- - reasoning
16
- - math
17
- - code
18
- - reinforcement learning
19
- - pytorch
20
  ---
21
 
22
  # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
23
 
 
 
24
  <p align="center">
25
 
26
  [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
@@ -33,7 +31,7 @@ tags:
33
 
34
  ## 🔥News
35
  - **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
36
- - Paper: https://arxiv.org/pdf/2506.13284
37
  - Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
38
  - 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
39
  - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
@@ -68,10 +66,6 @@ We evaluate our model against competitive reasoning models of comparable size wi
68
  | [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
69
  | [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
70
 
71
-
72
-
73
-
74
-
75
  ## How to use
76
  ```python
77
  import torch
@@ -104,7 +98,6 @@ generated_ids = [
104
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
105
  ```
106
 
107
-
108
  ## Usage Recommendations
109
 
110
  1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
@@ -114,15 +107,33 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
114
  question = "" # code question
115
  starter_code = "" # starter code function header
116
 
117
- code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
118
- code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
 
 
 
 
 
 
119
  if starter_code != "":
120
- question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
121
- question += "\n\n" + code_instruction_hasstartercode
 
 
 
 
 
 
 
 
 
122
  else:
123
- question += "\n\n" + code_instruction_nostartercode
 
 
124
 
125
- final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
 
126
  ```
127
  4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
128
 
@@ -130,15 +141,16 @@ final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
130
 
131
  Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
132
 
 
 
 
133
 
134
  ## Correspondence to
135
  Yang Chen ([email protected]), Zhuolin Yang ([email protected]), Zihan Liu ([email protected]), Chankyu Lee ([email protected]), Wei Ping ([email protected])
136
 
137
-
138
  ## License
139
  Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
140
 
141
-
142
  ## Citation
143
  ```
144
  @article{chen2025acereason,
@@ -147,5 +159,4 @@ Your use of this model is governed by the [NVIDIA Open Model License](https://ww
147
  journal={arXiv preprint arXiv:2505.16400},
148
  year={2025}
149
  }
150
- ```
151
-
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  library_name: transformers
5
  license: other
6
  license_name: nvidia-open-model-license
7
+ license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 
8
  pipeline_tag: text-generation
 
 
9
  tags:
10
+ - nvidia
11
+ - reasoning
12
+ - math
13
+ - code
14
+ - reinforcement learning
15
+ - pytorch
16
  ---
17
 
18
  # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
19
 
20
+ This repository contains the model for AceReason-Nemotron 1.1 as presented in [AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy](https://huggingface.co/papers/2506.13284).
21
+
22
  <p align="center">
23
 
24
  [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
 
31
 
32
  ## 🔥News
33
  - **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
34
+ - Paper: https://huggingface.co/papers/2506.13284
35
  - Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
36
  - 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
37
  - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
 
66
  | [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
67
  | [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
68
 
 
 
 
 
69
  ## How to use
70
  ```python
71
  import torch
 
98
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
99
  ```
100
 
 
101
  ## Usage Recommendations
102
 
103
  1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
 
107
  question = "" # code question
108
  starter_code = "" # starter code function header
109
 
110
+ code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:
111
+ ```python
112
+ # Your solution code here
113
+ ```"""
114
+ code_instruction_hasstartercode = """Please place the solution code in the following format:
115
+ ```python
116
+ # Your solution code here
117
+ ```"""
118
  if starter_code != "":
119
+ question += "
120
+
121
+ " + "Solve the problem starting with the provided function header.
122
+
123
+ Function header:
124
+ " + "```
125
+ " + starter_code + "
126
+ ```"
127
+ question += "
128
+
129
+ " + code_instruction_hasstartercode
130
  else:
131
+ question += "
132
+
133
+ " + code_instruction_nostartercode
134
 
135
+ final_prompt = "<|User|>" + question + "<|Assistant|><think>
136
+ "
137
  ```
138
  4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
139
 
 
141
 
142
  Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
143
 
144
+ ## Code
145
+
146
+ Our code is available at https://github.com/NVIDIA/TRT-LLM/tree/main/examples/research/ace_reason
147
 
148
  ## Correspondence to
149
  Yang Chen ([email protected]), Zhuolin Yang ([email protected]), Zihan Liu ([email protected]), Chankyu Lee ([email protected]), Wei Ping ([email protected])
150
 
 
151
  ## License
152
  Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
153
 
 
154
  ## Citation
155
  ```
156
  @article{chen2025acereason,
 
159
  journal={arXiv preprint arXiv:2505.16400},
160
  year={2025}
161
  }
162
+ ```