Add link to Github repository and paper page
#9
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,17 +1,15 @@
|
|
1 |
---
|
|
|
|
|
2 |
library_name: transformers
|
3 |
license: other
|
4 |
license_name: nvidia-open-model-license
|
5 |
-
license_link:
|
6 |
-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
7 |
-
|
8 |
pipeline_tag: text-generation
|
9 |
-
language:
|
10 |
-
- en
|
11 |
tags:
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
---
|
16 |
|
17 |
# Llama-3.1-Nemotron-Ultra-253B-v1
|
@@ -28,18 +26,18 @@ The model underwent a multi-phase post-training process to enhance both its reas
|
|
28 |
|
29 |
This model is ready for commercial use.
|
30 |
|
31 |
-
For more details on how the model was trained, please see our [technical report](https://
|
32 |
|
33 |

|
34 |
|
35 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
36 |
|
37 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
38 |
-
- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-
|
39 |
|
40 |
## License/Terms of Use
|
41 |
|
42 |
-
GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/
|
43 |
|
44 |
**Model Developer:** NVIDIA
|
45 |
|
@@ -55,15 +53,17 @@ Developers designing AI Agent systems, chatbots, RAG systems, and other AI-power
|
|
55 |
|
56 |
## References
|
57 |
|
58 |
-
* [
|
59 |
-
* [
|
60 |
-
* [
|
61 |
-
* [
|
62 |
|
63 |
## Model Architecture
|
64 |
**Architecture Type:** Dense decoder-only Transformer model
|
65 |
**Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
|
66 |
|
|
|
|
|
67 |
**This model was developed based on Llama-3.1-405B-Instruct <br>
|
68 |
** This model has 253B model parameters. <br>
|
69 |
|
@@ -248,7 +248,13 @@ Data Labeling for Evaluation Datasets:
|
|
248 |
User Prompt Template:
|
249 |
|
250 |
```
|
251 |
-
"What is the correct answer to this question: {question}
|
|
|
|
|
|
|
|
|
|
|
|
|
252 |
```
|
253 |
|
254 |
### AIME25
|
@@ -261,7 +267,8 @@ User Prompt Template:
|
|
261 |
User Prompt Template:
|
262 |
|
263 |
```
|
264 |
-
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}
|
|
|
265 |
```
|
266 |
|
267 |
### BFCL V2 Live
|
@@ -339,7 +346,8 @@ You will use the following starter code to write the solution to the problem and
|
|
339 |
User Prompt Template:
|
340 |
|
341 |
```
|
342 |
-
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}
|
|
|
343 |
```
|
344 |
|
345 |
### JudgeBench
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
library_name: transformers
|
5 |
license: other
|
6 |
license_name: nvidia-open-model-license
|
7 |
+
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
|
|
|
|
8 |
pipeline_tag: text-generation
|
|
|
|
|
9 |
tags:
|
10 |
+
- nvidia
|
11 |
+
- llama-3
|
12 |
+
- pytorch
|
13 |
---
|
14 |
|
15 |
# Llama-3.1-Nemotron-Ultra-253B-v1
|
|
|
26 |
|
27 |
This model is ready for commercial use.
|
28 |
|
29 |
+
For more details on how the model was trained, please see our [technical report](https://huggingface.co/papers/2505.00949) and [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/).
|
30 |
|
31 |

|
32 |
|
33 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
34 |
|
35 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
36 |
+
- [Llama-3.3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1)
|
37 |
|
38 |
## License/Terms of Use
|
39 |
|
40 |
+
GOVERNING TERMS: Your use of this model is governed by the [NVIDIA Open Model License.](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
|
41 |
|
42 |
**Model Developer:** NVIDIA
|
43 |
|
|
|
53 |
|
54 |
## References
|
55 |
|
56 |
+
* [[2505.00949] Llama-Nemotron: Efficient Reasoning Models](https://huggingface.co/papers/2505.00949)
|
57 |
+
* [[2502.00203] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203)
|
58 |
+
* [[2411.19146] Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146)
|
59 |
+
* [[2503.18908]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908)
|
60 |
|
61 |
## Model Architecture
|
62 |
**Architecture Type:** Dense decoder-only Transformer model
|
63 |
**Network Architecture:** Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS)
|
64 |
|
65 |
+
**Github:** https://github.com/NVIDIA/NeMo
|
66 |
+
|
67 |
**This model was developed based on Llama-3.1-405B-Instruct <br>
|
68 |
** This model has 253B model parameters. <br>
|
69 |
|
|
|
248 |
User Prompt Template:
|
249 |
|
250 |
```
|
251 |
+
"What is the correct answer to this question: {question}
|
252 |
+
Choices:
|
253 |
+
A. {option_A}
|
254 |
+
B. {option_B}
|
255 |
+
C. {option_C}
|
256 |
+
D. {option_D}
|
257 |
+
Let's think step by step, and put the final answer (should be a single letter A, B, C, or D) into a \boxed{}"
|
258 |
```
|
259 |
|
260 |
### AIME25
|
|
|
267 |
User Prompt Template:
|
268 |
|
269 |
```
|
270 |
+
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
|
271 |
+
Question: {question}"
|
272 |
```
|
273 |
|
274 |
### BFCL V2 Live
|
|
|
346 |
User Prompt Template:
|
347 |
|
348 |
```
|
349 |
+
"Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \boxed{}.
|
350 |
+
Question: {question}"
|
351 |
```
|
352 |
|
353 |
### JudgeBench
|