Text Generation
Transformers
Safetensors
PyTorch
nvidia
nemotron-h
suhara commited on
Commit
75ee340
·
verified ·
1 Parent(s): 2cc1904

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -41
README.md CHANGED
@@ -1,5 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Nemotron-H-47B-Base-8K
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  **Model Developer:** NVIDIA
4
 
5
  **Model Dates:**
@@ -12,31 +54,23 @@ September 2024
12
 
13
  The pretraining data has a cutoff date of September 2024.
14
 
15
- ## Model Overview
16
-
17
- NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
18
 
19
- For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
20
 
21
- This model is for research and development only.
22
 
23
- ## License/Terms of Use
24
- GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License.
25
- NVIDIA Internal Scientific Research and Development Model License
26
 
27
- ## Model Architecture
28
- - Architecture Type: Transformer
29
- - Network Architecture: Nemotron-Hybrid
30
 
31
- This model has 47B of model parameters.
32
 
33
- ### Deployment Geography: Global
34
-
35
- ### Use Case: This model is intended for developers and researchers building LLMs
36
 
37
- ### Release Date: 04/09/2025
38
- Huggingface 04/09/2025 via https://huggingface.co/
39
- NGC 04/09/2025 via https://catalog.ngc.nvidia.com/models
40
 
41
  ## Input
42
  - Input Type(s): Text
@@ -52,7 +86,7 @@ NGC 04/09/2025 via https://catalog.ngc.nvidia.com/models
52
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
53
 
54
  ## Software Integration
55
- - Runtime Engine(s): NeMo 24.09
56
  - Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100
57
  - Operating System(s): Linux
58
 
@@ -63,6 +97,22 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
63
 
64
  As this is a base model, no explicit prompt format is recommended or required.
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Training, Testing, and Evaluation Datasets
67
 
68
  #Training & Testing Datasets:
@@ -84,19 +134,17 @@ Hybrid: Automated, Human, Synthetic
84
  **Data Labeling for Training Datasets:**
85
  Hybrid: Automated, Human, Synthetic
86
 
87
- #### Reasoning Evaluations:
88
 
89
  | ARC Challenge 25-shot | Hellaswag 10-shot | Winogrande 5-shot | CommonsenseQA 7-shot |
90
  |-------------|--------------|-----------------|------------------|
91
  | 94.6 | 87.9 | 83.9 | 87.3 |
92
 
93
- ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. [Dataset](https://huggingface.co/datasets/allenai/ai2_arc)
94
-
95
- Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. [Dataset](https://huggingface.co/datasets/Rowan/hellaswag )
96
 
97
- Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. [Dataset](https://huggingface.co/datasets/allenai/winogrande )
98
-
99
- CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. [Dataset](https://huggingface.co/datasets/tau/commonsense_qa )
100
 
101
  #### Coding Evaluations:
102
 
@@ -104,11 +152,9 @@ CommonsenseQA - A multiple-choice question answering dataset that requires diffe
104
  |-------------|--------------|-----------------|------------------|
105
  | 75.9 | 65.6| 61.0 | 56.1 |
106
 
107
- MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. [Dataset](https://github.com/google-research/google-research/tree/master/mbpp)
108
-
109
- MBPP+ - Extended version of MBPP with additional validation. [Dataset](https://huggingface.co/datasets/evalplus/mbppplus)
110
-
111
- HumanEval - Tests code generation and completion abilities in Python. [Dataset](https://github.com/openai/human-eval)
112
 
113
  #### Math Evaluations:
114
 
@@ -116,13 +162,10 @@ HumanEval - Tests code generation and completion abilities in Python. [Dataset](
116
  |--------------|------------|------------|------------|
117
  | 93.3 | 57.4 | 34.2 | 57.9 |
118
 
119
- GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. [Dataset](https://github.com/openai/grade-school-math)
120
-
121
- MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
122
-
123
- MATH Lvl 5 - Only the most difficult questions from the MATH dataset. [Dataset](https://github.com/hendrycks/math)
124
-
125
- MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
126
 
127
  #### Other Evaluations:
128
 
@@ -131,9 +174,8 @@ MATH-500 - Tests advanced mathematical problem solving across algebra, geometry,
131
  |------------------|------------|
132
  |83.6 | 61.8 |
133
 
134
- MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. [Dataset](https://github.com/hendrycks/test)
135
-
136
- MMLU Pro - Evaluates language understanding models across a broad range of challenging, reasoning-focused questions across 14 diverse domains.
137
  [Dataset](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro)
138
 
139
  ## Potential Known Risks for Usage
@@ -156,3 +198,4 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
156
 
157
 
158
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: nvidia-internal-scientific-research-and-development-model-license
5
+ license_link: >-
6
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
7
+ pipeline_tag: text-generation
8
+ language:
9
+ - en
10
+ - de
11
+ - es
12
+ - fr
13
+ - it
14
+ - ko
15
+ - pt
16
+ - ru
17
+ - jp
18
+ - zh
19
+ tags:
20
+ - nvidia
21
+ - pytorch
22
+ - nemotron-h
23
+ ---
24
+
25
  # Nemotron-H-47B-Base-8K
26
 
27
+ ## Model Overview
28
+
29
+ NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
30
+
31
+ For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
32
+
33
+ This model is for research and development only.
34
+
35
+ This model is part of the Nemotron-H Collection. You can find the models in this family here:
36
+ - [Nemotron-H-56B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-56B-Base-8K)
37
+ - [Nemotron-H-47B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-47B-Base-8K)
38
+ - [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K)
39
+
40
+
41
+ ## License/Terms of Use
42
+
43
+ GOVERNING TERMS: Use of this model is governed by the [NVIDIA Internal Scientific Research and Development Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/).
44
+
45
  **Model Developer:** NVIDIA
46
 
47
  **Model Dates:**
 
54
 
55
  The pretraining data has a cutoff date of September 2024.
56
 
57
+ ## Use Case:
 
 
58
 
59
+ This model is intended for developers and researchers building LLMs.
60
 
61
+ ## Release Date:
62
 
63
+ 4/12/2025
 
 
64
 
65
+ ## References
 
 
66
 
67
+ - [\[2504.03624\] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models](https://arxiv.org/abs/2504.03624)
68
 
69
+ ## Model Architecture
70
+ - Architecture Type: Hybrid Mamba-Transformer
71
+ - Network Architecture: Nemotron-H
72
 
73
+ This model has 47B of model parameters.
 
 
74
 
75
  ## Input
76
  - Input Type(s): Text
 
86
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
87
 
88
  ## Software Integration
89
+ - Runtime Engine(s): NeMo 24.12
90
  - Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100
91
  - Operating System(s): Linux
92
 
 
97
 
98
  As this is a base model, no explicit prompt format is recommended or required.
99
 
100
+ ### Example
101
+
102
+ ```python
103
+ import torch
104
+ from transformers import AutoTokenizer, AutoModelForCausalLM
105
+
106
+ # Load the tokenizer and model
107
+ tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-47B-Base-8K", trust_remote_code=True)
108
+ model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-H-47B-Base-8K", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
109
+
110
+ prompt = "When was NVIDIA founded?"
111
+
112
+ outputs = model.generate(**tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device))
113
+ print(tokenizer.decode(outputs[0]))
114
+ ```
115
+
116
  ## Training, Testing, and Evaluation Datasets
117
 
118
  #Training & Testing Datasets:
 
134
  **Data Labeling for Training Datasets:**
135
  Hybrid: Automated, Human, Synthetic
136
 
137
+ #### Commonsense Understanding Evaluations:
138
 
139
  | ARC Challenge 25-shot | Hellaswag 10-shot | Winogrande 5-shot | CommonsenseQA 7-shot |
140
  |-------------|--------------|-----------------|------------------|
141
  | 94.6 | 87.9 | 83.9 | 87.3 |
142
 
143
+ - ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. [Dataset](https://huggingface.co/datasets/allenai/ai2_arc)
144
+ - Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. [Dataset](https://huggingface.co/datasets/Rowan/hellaswag )
 
145
 
146
+ - Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. [Dataset](https://huggingface.co/datasets/allenai/winogrande )
147
+ - CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. [Dataset](https://huggingface.co/datasets/tau/commonsense_qa )
 
148
 
149
  #### Coding Evaluations:
150
 
 
152
  |-------------|--------------|-----------------|------------------|
153
  | 75.9 | 65.6| 61.0 | 56.1 |
154
 
155
+ - MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. [Dataset](https://github.com/google-research/google-research/tree/master/mbpp)
156
+ - MBPP+ - Extended version of MBPP with additional validation. [Dataset](https://huggingface.co/datasets/evalplus/mbppplus)
157
+ - HumanEval - Tests code generation and completion abilities in Python. [Dataset](https://github.com/openai/human-eval)
 
 
158
 
159
  #### Math Evaluations:
160
 
 
162
  |--------------|------------|------------|------------|
163
  | 93.3 | 57.4 | 34.2 | 57.9 |
164
 
165
+ - GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. [Dataset](https://github.com/openai/grade-school-math)
166
+ - MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
167
+ - MATH Lvl 5 - Only the most difficult questions from the MATH dataset. [Dataset](https://github.com/hendrycks/math)
168
+ - MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. [Dataset](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
 
 
 
169
 
170
  #### Other Evaluations:
171
 
 
174
  |------------------|------------|
175
  |83.6 | 61.8 |
176
 
177
+ - MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. [Dataset](https://github.com/hendrycks/test)
178
+ - MMLU Pro - Evaluates language understanding models across a broad range of challenging, reasoning-focused questions across 14 diverse domains.
 
179
  [Dataset](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro)
180
 
181
  ## Potential Known Risks for Usage
 
198
 
199
 
200
 
201
+