Text Generation
Transformers
Safetensors
PyTorch
nvidia
nemotron-h
suhara commited on
Commit
b9aaa05
·
verified ·
1 Parent(s): 75ee340

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -22
README.md CHANGED
@@ -6,27 +6,29 @@ license_link: >-
6
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
7
  pipeline_tag: text-generation
8
  language:
9
- - en
10
- - de
11
- - es
12
- - fr
13
- - it
14
- - ko
15
- - pt
16
- - ru
17
- - jp
18
- - zh
19
  tags:
20
- - nvidia
21
- - pytorch
22
- - nemotron-h
 
 
23
  ---
24
 
25
  # Nemotron-H-47B-Base-8K
26
 
27
  ## Model Overview
28
 
29
- NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
30
 
31
  For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
32
 
@@ -70,7 +72,7 @@ This model is intended for developers and researchers building LLMs.
70
  - Architecture Type: Hybrid Mamba-Transformer
71
  - Network Architecture: Nemotron-H
72
 
73
- This model has 47B of model parameters.
74
 
75
  ## Input
76
  - Input Type(s): Text
@@ -115,7 +117,7 @@ print(tokenizer.decode(outputs[0]))
115
 
116
  ## Training, Testing, and Evaluation Datasets
117
 
118
- #Training & Testing Datasets:
119
  The training corpus for Nemotron-H-47B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies.
120
 
121
  **Data Collection for Training & Testing Datasets:**
@@ -193,9 +195,4 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
193
 
194
  For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use.
195
 
196
- Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
197
-
198
-
199
-
200
-
201
-
 
6
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
7
  pipeline_tag: text-generation
8
  language:
9
+ - en
10
+ - de
11
+ - es
12
+ - fr
13
+ - it
14
+ - ko
15
+ - pt
16
+ - ru
17
+ - jp
18
+ - zh
19
  tags:
20
+ - nvidia
21
+ - pytorch
22
+ - nemotron-h
23
+ base_model:
24
+ - nvidia/Nemotron-H-56B-Base-8K
25
  ---
26
 
27
  # Nemotron-H-47B-Base-8K
28
 
29
  ## Model Overview
30
 
31
+ NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from [Nemotron-H-56B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-56B-Base-8K) using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
32
 
33
  For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
34
 
 
72
  - Architecture Type: Hybrid Mamba-Transformer
73
  - Network Architecture: Nemotron-H
74
 
75
+ This model has 47B model parameters.
76
 
77
  ## Input
78
  - Input Type(s): Text
 
117
 
118
  ## Training, Testing, and Evaluation Datasets
119
 
120
+ ### Training & Testing Datasets:
121
  The training corpus for Nemotron-H-47B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies.
122
 
123
  **Data Collection for Training & Testing Datasets:**
 
195
 
196
  For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use.
197
 
198
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).