Add link to Github repository and paper page

#9
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +26 -18
README.md CHANGED
@@ -1,17 +1,15 @@
1
  ---
 
 
2
  library_name: transformers
3
  license: other
4
  license_name: nvidia-open-model-license
5
- license_link: >-
6
- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
7
-
8
  pipeline_tag: text-generation
9
- language:
10
- - en
11
  tags:
12
- - nvidia
13
- - llama-3
14
- - pytorch
15
  ---
16
 
17
  # Llama-3.1-Nemotron-Ultra-253B-v1
@@ -28,18 +26,18 @@ The model underwent a multi-phase post-training process to enhance both its reas
28
 
29
  This model is ready for commercial use.
30
 
31
- For more details on how the model was trained, please see our [technical report](https://arxiv.org/abs/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
32
 
33
  ![Training Flow](./training_flowchart.png)
34
 
35
  This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
36
 
37
  - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
38
- - [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3\_3-Nemotron-Super-49B-v1)
39
 
40
  ## License/Terms of Use
41
 
42
- GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3\_1/license/). Built with Llama.
43
 
44
  **Model Developer:** NVIDIA
45
 
@@ -55,15 +53,17 @@ Developers designing AI Agent systems, chatbots, RAG systems, and other AI-power
55
 
56
  ## References
57
 
58
- * [\[2505.00949\] Llama-Nemotron: Efficient Reasoning Models](https://arxiv.org/abs/2505.00949)
59
- * [\[2502.00203\] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
60
- * [\[2411.19146\]Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
61
- * [\[2503.18908\]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
62
 
63
  ## Model Architecture
64
  **Architecture Type:** Dense decoder-only Transformer model
65
  **Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
66
 
 
 
67
  **This model was developed based on Llama-3.1-405B-Instruct <br>
68
  ** This model has 253B model parameters. <br>
69
 
@@ -248,7 +248,13 @@ Data Labeling for Evaluation Datasets:
248
  User Prompt Template:
249
 
250
  ```
251
- "What is the correct answer to this question: {question}\nChoices:\nA. {option_A}\nB. {option_B}\nC. {option_C}\nD. {option_D}\nLet's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
 
 
 
 
 
 
252
  ```
253
 
254
  ### AIME25
@@ -261,7 +267,8 @@ User Prompt Template:
261
  User Prompt Template:
262
 
263
  ```
264
- "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.\nQuestion: {question}"
 
265
  ```
266
 
267
  ### BFCL V2 Live
@@ -339,7 +346,8 @@ You will use the following starter code to write the solution to the problem and
339
  User Prompt Template:
340
 
341
  ```
342
- "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.\nQuestion: {question}"
 
343
  ```
344
 
345
  ### JudgeBench
 
1
  ---
2
+ language:
3
+ - en
4
  library_name: transformers
5
  license: other
6
  license_name: nvidia-open-model-license
7
+ license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 
 
8
  pipeline_tag: text-generation
 
 
9
  tags:
10
+ - nvidia
11
+ - llama-3
12
+ - pytorch
13
  ---
14
 
15
  # Llama-3.1-Nemotron-Ultra-253B-v1
 
26
 
27
  This model is ready for commercial use.
28
 
29
+ For more details on how the model was trained, please see our [technical report](https://huggingface.co/papers/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
30
 
31
  ![Training Flow](./training_flowchart.png)
32
 
33
  This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
34
 
35
  - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
36
+ - [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1)
37
 
38
  ## License/Terms of Use
39
 
40
+ GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
41
 
42
  **Model Developer:** NVIDIA
43
 
 
53
 
54
  ## References
55
 
56
+ * [[2505.00949] Llama-Nemotron: Efficient Reasoning Models](https://huggingface.co/papers/2505.00949)
57
+ * [[2502.00203] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
58
+ * [[2411.19146] Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
59
+ * [[2503.18908]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
60
 
61
  ## Model Architecture
62
  **Architecture Type:** Dense decoder-only Transformer model
63
  **Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
64
 
65
+ **Github:** https://github.com/NVIDIA/NeMo
66
+
67
  **This model was developed based on Llama-3.1-405B-Instruct <br>
68
  ** This model has 253B model parameters. <br>
69
 
 
248
  User Prompt Template:
249
 
250
  ```
251
+ "What is the correct answer to this question: {question}
252
+ Choices:
253
+ A. {option_A}
254
+ B. {option_B}
255
+ C. {option_C}
256
+ D. {option_D}
257
+ Let's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
258
  ```
259
 
260
  ### AIME25
 
267
  User Prompt Template:
268
 
269
  ```
270
+ "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
271
+ Question: {question}"
272
  ```
273
 
274
  ### BFCL V2 Live
 
346
  User Prompt Template:
347
 
348
  ```
349
+ "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
350
+ Question: {question}"
351
  ```
352
 
353
  ### JudgeBench