uer
/

gpt2-distil-chinese-cluecorpussmall

@@ -112,7 +112,7 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
                       --dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
                       --vocab_path models/google_zh_vocab.txt \
                       --config_path models/gpt2/xlarge_config.json \
-                      --output_model_path models/cluecorpussmall_gpt2_xlarge_seq128 \
                       --world_size 8 --batch_size 64 \
                       --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
                       --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
@@ -121,8 +121,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
 Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 ```
-python3 models/cluecorpussmall_gpt2_xlarge_seq128/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq128/ \
-                                                                  models/cluecorpussmall_gpt2_xlarge_seq128.bin
 ```
 Stage2:
@@ -139,8 +139,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
                       --dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
                       --vocab_path models/google_zh_vocab.txt \
                       --config_path models/gpt2/xlarge_config.json \
-                      --pretrained_model_path models/cluecorpussmall_gpt2_xlarge_seq128.bin \
-                      --output_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_stage2 \
                       --world_size 8 --batch_size 16 --learning_rate 5e-5 \
                       --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
                       --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
@@ -149,14 +149,14 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
 Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 ```
-python3 models/cluecorpussmall_gpt2_xlarge_seq1024_stage2/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq1024_stage2/ \
-                                                                          models/cluecorpussmall_gpt2_xlarge_seq1024_stage2.bin
 ```
 Finally, we convert the pre-trained model into Huggingface's format:
 ```
-python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_stage2.bin \
                                                                     --output_model_path pytorch_model.bin \
                                                                     --layers_num 48
 ```

                       --dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
                       --vocab_path models/google_zh_vocab.txt \
                       --config_path models/gpt2/xlarge_config.json \
+                      --output_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model \
                       --world_size 8 --batch_size 64 \
                       --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
                       --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
 Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 ```
+python3 models/cluecorpussmall_gpt2_xlarge_seq128_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq128_model/ \
+                                                                  models/cluecorpussmall_gpt2_xlarge_seq128_model.bin
 ```
 Stage2:
                       --dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
                       --vocab_path models/google_zh_vocab.txt \
                       --config_path models/gpt2/xlarge_config.json \
+                      --pretrained_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model.bin \
+                      --output_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model \
                       --world_size 8 --batch_size 16 --learning_rate 5e-5 \
                       --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
                       --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
 Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
 ```
+python3 models/cluecorpussmall_gpt2_xlarge_seq1024_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq1024_model/ \
+                                                                          models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin
 ```
 Finally, we convert the pre-trained model into Huggingface's format:
 ```
+python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin \
                                                                     --output_model_path pytorch_model.bin \
                                                                     --layers_num 48
 ```