nvidia
/

Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions Community

xp1992slz commited on Apr 9

Commit

963b8a0

·

verified ·

1 Parent(s): ac0a539

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -52,7 +52,7 @@ print(outputs[0]["generated_text"][-1])
 * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 * Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
-* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
 * Maximum context window: 1M tokens
 ## Evaluation Results

 * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 * Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
+* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains. We subsample the data from the ‘general_sft_stage2’ from [AceMath-Instruct](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data).
 * Maximum context window: 1M tokens
 ## Evaluation Results