Update README.md
Browse files
README.md
CHANGED
@@ -52,7 +52,7 @@ print(outputs[0]["generated_text"][-1])
|
|
52 |
|
53 |
* Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
54 |
* Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
|
55 |
-
* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
|
56 |
* Maximum context window: 1M tokens
|
57 |
|
58 |
## Evaluation Results
|
|
|
52 |
|
53 |
* Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
54 |
* Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
|
55 |
+
* Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains. We subsample the data from the ‘general_sft_stage2’ from [AceMath-Instruct](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data).
|
56 |
* Maximum context window: 1M tokens
|
57 |
|
58 |
## Evaluation Results
|