nvidia
/

Llama-3_1-Nemotron-Ultra-253B-v1

@@ -1,17 +1,15 @@
 ---
 library_name: transformers
 license: other
 license_name: nvidia-open-model-license
-license_link: >-
-  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 pipeline_tag: text-generation
-language:
-  - en
 tags:
-  - nvidia
-  - llama-3
-  - pytorch
 ---
 # Llama-3.1-Nemotron-Ultra-253B-v1
@@ -28,18 +26,18 @@ The model underwent a multi-phase post-training process to enhance both its reas
 This model is ready for commercial use.
-For more details on how the model was trained, please see our [technical report](https://arxiv.org/abs/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
 ![Training Flow](./training_flowchart.png)
 This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
 - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
-- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3\_3-Nemotron-Super-49B-v1)
 ## License/Terms of Use
-GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3\_1/license/). Built with Llama.
 **Model Developer:** NVIDIA
@@ -55,15 +53,17 @@ Developers designing AI Agent systems, chatbots, RAG systems, and other AI-power
 ## References
-* [\[2505.00949\] Llama-Nemotron: Efficient Reasoning Models](https://arxiv.org/abs/2505.00949)
-* [\[2502.00203\] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
-* [\[2411.19146\]Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
-* [\[2503.18908\]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
 ## Model Architecture
 **Architecture Type:** Dense decoder-only Transformer model
 **Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
 **This model was developed based on Llama-3.1-405B-Instruct <br>
 ** This model has 253B model parameters. <br>
@@ -248,7 +248,13 @@ Data Labeling for Evaluation Datasets:
 User Prompt Template:
 ```
-"What is the correct answer to this question: {question}\nChoices:\nA. {option_A}\nB. {option_B}\nC. {option_C}\nD. {option_D}\nLet's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
 ```
 ### AIME25
@@ -261,7 +267,8 @@ User Prompt Template:
 User Prompt Template:
 ```
-"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.\nQuestion: {question}"
 ```
 ### BFCL V2 Live
@@ -339,7 +346,8 @@ You will use the following starter code to write the solution to the problem and
 User Prompt Template:
 ```
-"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.\nQuestion: {question}"
 ```
 ### JudgeBench

 ---
+language:
+- en
 library_name: transformers
 license: other
 license_name: nvidia-open-model-license
+license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 pipeline_tag: text-generation
 tags:
+- nvidia
+- llama-3
+- pytorch
 ---
 # Llama-3.1-Nemotron-Ultra-253B-v1
 This model is ready for commercial use.
+For more details on how the model was trained, please see our [technical report](https://huggingface.co/papers/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
 ![Training Flow](./training_flowchart.png)
 This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
 - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
+- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1)
 ## License/Terms of Use
+GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
 **Model Developer:** NVIDIA
 ## References
+* [[2505.00949] Llama-Nemotron: Efficient Reasoning Models](https://huggingface.co/papers/2505.00949)
+* [[2502.00203] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
+* [[2411.19146] Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
+* [[2503.18908]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
 ## Model Architecture
 **Architecture Type:** Dense decoder-only Transformer model
 **Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
+**Github:** https://github.com/NVIDIA/NeMo
 **This model was developed based on Llama-3.1-405B-Instruct <br>
 ** This model has 253B model parameters. <br>
 User Prompt Template:
 ```
+"What is the correct answer to this question: {question}
+Choices:
+A. {option_A}
+B. {option_B}
+C. {option_C}
+D. {option_D}
+Let's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
 ```
 ### AIME25
 User Prompt Template:
 ```
+"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
+Question: {question}"
 ```
 ### BFCL V2 Live
 User Prompt Template:
 ```
+"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
+Question: {question}"
 ```
 ### JudgeBench