Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Nemotron-H-47B-Base-8K
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
**Model Developer:** NVIDIA
|
4 |
|
5 |
**Model Dates:**
|
@@ -12,31 +54,23 @@ September 2024
|
|
12 |
|
13 |
The pretraining data has a cutoff date of September 2024.
|
14 |
|
15 |
-
##
|
16 |
-
|
17 |
-
NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
-
GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License.
|
25 |
-
NVIDIA Internal Scientific Research and Development Model License
|
26 |
|
27 |
-
##
|
28 |
-
- Architecture Type: Transformer
|
29 |
-
- Network Architecture: Nemotron-Hybrid
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
|
37 |
-
|
38 |
-
Huggingface 04/09/2025 via https://huggingface.co/
|
39 |
-
NGC 04/09/2025 via https://catalog.ngc.nvidia.com/models
|
40 |
|
41 |
## Input
|
42 |
- Input Type(s): Text
|
@@ -52,7 +86,7 @@ NGC 04/09/2025 via https://catalog.ngc.nvidia.com/models
|
|
52 |
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
|
53 |
|
54 |
## Software Integration
|
55 |
-
- Runtime Engine(s): NeMo 24.
|
56 |
- Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100
|
57 |
- Operating System(s): Linux
|
58 |
|
@@ -63,6 +97,22 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
|
|
63 |
|
64 |
As this is a base model, no explicit prompt format is recommended or required.
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
## Training, Testing, and Evaluation Datasets
|
67 |
|
68 |
#Training & Testing Datasets:
|
@@ -84,19 +134,17 @@ Hybrid: Automated, Human, Synthetic
|
|
84 |
**Data Labeling for Training Datasets:**
|
85 |
Hybrid: Automated, Human, Synthetic
|
86 |
|
87 |
-
####
|
88 |
|
89 |
| ARC Challenge 25-shot | Hellaswag 10-shot | Winogrande 5-shot | CommonsenseQA 7-shot |
|
90 |
|-------------|--------------|-----------------|------------------|
|
91 |
| 94.6 | 87.9 | 83.9 | 87.3 |
|
92 |
|
93 |
-
ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. [Dataset](https://huggingface.co/datasets/allenai/ai2_arc)
|
94 |
-
|
95 |
-
Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. [Dataset](https://huggingface.co/datasets/Rowan/hellaswag )
|
96 |
|
97 |
-
Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. [Dataset](https://huggingface.co/datasets/allenai/winogrande )
|
98 |
-
|
99 |
-
CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. [Dataset](https://huggingface.co/datasets/tau/commonsense_qa )
|
100 |
|
101 |
#### Coding Evaluations:
|
102 |
|
@@ -104,11 +152,9 @@ CommonsenseQA - A multiple-choice question answering dataset that requires diffe
|
|
104 |
|-------------|--------------|-----------------|------------------|
|
105 |
| 75.9 | 65.6| 61.0 | 56.1 |
|
106 |
|
107 |
-
MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. [Dataset](https://github.com/google-research/google-research/tree/master/mbpp)
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
HumanEval - Tests code generation and completion abilities in Python. [Dataset](https://github.com/openai/human-eval)
|
112 |
|
113 |
#### Math Evaluations:
|
114 |
|
@@ -116,13 +162,10 @@ HumanEval - Tests code generation and completion abilities in Python. [Dataset](
|
|
116 |
|--------------|------------|------------|------------|
|
117 |
| 93.3 | 57.4 | 34.2 | 57.9 |
|
118 |
|
119 |
-
GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. [Dataset](https://github.com/openai/grade-school-math)
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
MATH Lvl 5 - Only the most difficult questions from the MATH dataset. [Dataset](https://github.com/hendrycks/math)
|
124 |
-
|
125 |
-
MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
|
126 |
|
127 |
#### Other Evaluations:
|
128 |
|
@@ -131,9 +174,8 @@ MATH-500 - Tests advanced mathematical problem solving across algebra, geometry,
|
|
131 |
|------------------|------------|
|
132 |
|83.6 | 61.8 |
|
133 |
|
134 |
-
MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. [Dataset](https://github.com/hendrycks/test)
|
135 |
-
|
136 |
-
MMLU Pro - Evaluates language understanding models across a broad range of challenging, reasoning-focused questions across 14 diverse domains.
|
137 |
[Dataset](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro)
|
138 |
|
139 |
## Potential Known Risks for Usage
|
@@ -156,3 +198,4 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
|
|
156 |
|
157 |
|
158 |
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: other
|
4 |
+
license_name: nvidia-internal-scientific-research-and-development-model-license
|
5 |
+
license_link: >-
|
6 |
+
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
- de
|
11 |
+
- es
|
12 |
+
- fr
|
13 |
+
- it
|
14 |
+
- ko
|
15 |
+
- pt
|
16 |
+
- ru
|
17 |
+
- jp
|
18 |
+
- zh
|
19 |
+
tags:
|
20 |
+
- nvidia
|
21 |
+
- pytorch
|
22 |
+
- nemotron-h
|
23 |
+
---
|
24 |
+
|
25 |
# Nemotron-H-47B-Base-8K
|
26 |
|
27 |
+
## Model Overview
|
28 |
+
|
29 |
+
NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
|
30 |
+
|
31 |
+
For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
|
32 |
+
|
33 |
+
This model is for research and development only.
|
34 |
+
|
35 |
+
This model is part of the Nemotron-H Collection. You can find the models in this family here:
|
36 |
+
- [Nemotron-H-56B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-56B-Base-8K)
|
37 |
+
- [Nemotron-H-47B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-47B-Base-8K)
|
38 |
+
- [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K)
|
39 |
+
|
40 |
+
|
41 |
+
## License/Terms of Use
|
42 |
+
|
43 |
+
GOVERNING TERMS: Use of this model is governed by the [NVIDIA Internal Scientific Research and Development Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/).
|
44 |
+
|
45 |
**Model Developer:** NVIDIA
|
46 |
|
47 |
**Model Dates:**
|
|
|
54 |
|
55 |
The pretraining data has a cutoff date of September 2024.
|
56 |
|
57 |
+
## Use Case:
|
|
|
|
|
58 |
|
59 |
+
This model is intended for developers and researchers building LLMs.
|
60 |
|
61 |
+
## Release Date:
|
62 |
|
63 |
+
4/12/2025
|
|
|
|
|
64 |
|
65 |
+
## References
|
|
|
|
|
66 |
|
67 |
+
- [\[2504.03624\] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models](https://arxiv.org/abs/2504.03624)
|
68 |
|
69 |
+
## Model Architecture
|
70 |
+
- Architecture Type: Hybrid Mamba-Transformer
|
71 |
+
- Network Architecture: Nemotron-H
|
72 |
|
73 |
+
This model has 47B of model parameters.
|
|
|
|
|
74 |
|
75 |
## Input
|
76 |
- Input Type(s): Text
|
|
|
86 |
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
|
87 |
|
88 |
## Software Integration
|
89 |
+
- Runtime Engine(s): NeMo 24.12
|
90 |
- Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100
|
91 |
- Operating System(s): Linux
|
92 |
|
|
|
97 |
|
98 |
As this is a base model, no explicit prompt format is recommended or required.
|
99 |
|
100 |
+
### Example
|
101 |
+
|
102 |
+
```python
|
103 |
+
import torch
|
104 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
105 |
+
|
106 |
+
# Load the tokenizer and model
|
107 |
+
tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-47B-Base-8K", trust_remote_code=True)
|
108 |
+
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-H-47B-Base-8K", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
|
109 |
+
|
110 |
+
prompt = "When was NVIDIA founded?"
|
111 |
+
|
112 |
+
outputs = model.generate(**tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device))
|
113 |
+
print(tokenizer.decode(outputs[0]))
|
114 |
+
```
|
115 |
+
|
116 |
## Training, Testing, and Evaluation Datasets
|
117 |
|
118 |
#Training & Testing Datasets:
|
|
|
134 |
**Data Labeling for Training Datasets:**
|
135 |
Hybrid: Automated, Human, Synthetic
|
136 |
|
137 |
+
#### Commonsense Understanding Evaluations:
|
138 |
|
139 |
| ARC Challenge 25-shot | Hellaswag 10-shot | Winogrande 5-shot | CommonsenseQA 7-shot |
|
140 |
|-------------|--------------|-----------------|------------------|
|
141 |
| 94.6 | 87.9 | 83.9 | 87.3 |
|
142 |
|
143 |
+
- ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. [Dataset](https://huggingface.co/datasets/allenai/ai2_arc)
|
144 |
+
- Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. [Dataset](https://huggingface.co/datasets/Rowan/hellaswag )
|
|
|
145 |
|
146 |
+
- Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. [Dataset](https://huggingface.co/datasets/allenai/winogrande )
|
147 |
+
- CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. [Dataset](https://huggingface.co/datasets/tau/commonsense_qa )
|
|
|
148 |
|
149 |
#### Coding Evaluations:
|
150 |
|
|
|
152 |
|-------------|--------------|-----------------|------------------|
|
153 |
| 75.9 | 65.6| 61.0 | 56.1 |
|
154 |
|
155 |
+
- MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. [Dataset](https://github.com/google-research/google-research/tree/master/mbpp)
|
156 |
+
- MBPP+ - Extended version of MBPP with additional validation. [Dataset](https://huggingface.co/datasets/evalplus/mbppplus)
|
157 |
+
- HumanEval - Tests code generation and completion abilities in Python. [Dataset](https://github.com/openai/human-eval)
|
|
|
|
|
158 |
|
159 |
#### Math Evaluations:
|
160 |
|
|
|
162 |
|--------------|------------|------------|------------|
|
163 |
| 93.3 | 57.4 | 34.2 | 57.9 |
|
164 |
|
165 |
+
- GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. [Dataset](https://github.com/openai/grade-school-math)
|
166 |
+
- MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
|
167 |
+
- MATH Lvl 5 - Only the most difficult questions from the MATH dataset. [Dataset](https://github.com/hendrycks/math)
|
168 |
+
- MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
|
|
|
|
|
|
|
169 |
|
170 |
#### Other Evaluations:
|
171 |
|
|
|
174 |
|------------------|------------|
|
175 |
|83.6 | 61.8 |
|
176 |
|
177 |
+
- MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. [Dataset](https://github.com/hendrycks/test)
|
178 |
+
- MMLU Pro - Evaluates language understanding models across a broad range of challenging, reasoning-focused questions across 14 diverse domains.
|
|
|
179 |
[Dataset](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro)
|
180 |
|
181 |
## Potential Known Risks for Usage
|
|
|
198 |
|
199 |
|
200 |
|
201 |
+
|