nvidia
/

Nemotron-H-47B-Base-8K

@@ -6,27 +6,29 @@ license_link: >-
   https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
 pipeline_tag: text-generation
 language:
-  - en
-  - de
-  - es
-  - fr
-  - it
-  - ko
-  - pt
-  - ru
-  - jp
-  - zh
 tags:
-  - nvidia
-  - pytorch
-  - nemotron-h
 ---
 # Nemotron-H-47B-Base-8K
 ## Model Overview
-NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from Nemotron-H-47B-Base-8K using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
 For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
@@ -70,7 +72,7 @@ This model is intended for developers and researchers building LLMs.
 - Architecture Type: Hybrid Mamba-Transformer
 - Network Architecture: Nemotron-H
-This model has 47B of model parameters.
 ## Input
 - Input Type(s): Text
@@ -115,7 +117,7 @@ print(tokenizer.decode(outputs[0]))
 ## Training, Testing, and Evaluation Datasets
-#Training & Testing Datasets:
 The training corpus for Nemotron-H-47B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies.
 **Data Collection for Training & Testing Datasets:**
@@ -193,9 +195,4 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
 For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use.
-Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

   https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/
 pipeline_tag: text-generation
 language:
+- en
+- de
+- es
+- fr
+- it
+- ko
+- pt
+- ru
+- jp
+- zh
 tags:
+- nvidia
+- pytorch
+- nemotron-h
+base_model:
+- nvidia/Nemotron-H-56B-Base-8K
 ---
 # Nemotron-H-47B-Base-8K
 ## Model Overview
+NVIDIA Nemotron-H-47B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just five Attention layers. The model is pruned and distilled from [Nemotron-H-56B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-56B-Base-8K) using 63B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the [project page](https://research.nvidia.com/labs/adlr/nemotronh/) and the [technical report](https://arxiv.org/abs/2504.03624).
 For best performance on a given task, users are encouraged to customize the model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
 - Architecture Type: Hybrid Mamba-Transformer
 - Network Architecture: Nemotron-H
+This model has 47B model parameters.
 ## Input
 - Input Type(s): Text
 ## Training, Testing, and Evaluation Datasets
+### Training & Testing Datasets:
 The training corpus for Nemotron-H-47B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies.
 **Data Collection for Training & Testing Datasets:**
 For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use.
+Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).