xp1992slz commited on
Commit
963b8a0
·
verified ·
1 Parent(s): ac0a539

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -52,7 +52,7 @@ print(outputs[0]["generated_text"][-1])
52
 
53
  * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
54
  * Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
55
- * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
56
  * Maximum context window: 1M tokens
57
 
58
  ## Evaluation Results
 
52
 
53
  * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
54
  * Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
55
+ * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains. We subsample the data from the ‘general_sft_stage2’ from [AceMath-Instruct](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data).
56
  * Maximum context window: 1M tokens
57
 
58
  ## Evaluation Results