Update README.md
Browse files
README.md
CHANGED
@@ -112,7 +112,7 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
|
|
112 |
--dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
|
113 |
--vocab_path models/google_zh_vocab.txt \
|
114 |
--config_path models/gpt2/xlarge_config.json \
|
115 |
-
--output_model_path models/
|
116 |
--world_size 8 --batch_size 64 \
|
117 |
--total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
|
118 |
--deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
|
@@ -121,8 +121,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
|
|
121 |
Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
122 |
|
123 |
```
|
124 |
-
python3 models/
|
125 |
-
models/
|
126 |
```
|
127 |
|
128 |
Stage2:
|
@@ -139,8 +139,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
|
|
139 |
--dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
|
140 |
--vocab_path models/google_zh_vocab.txt \
|
141 |
--config_path models/gpt2/xlarge_config.json \
|
142 |
-
--pretrained_model_path models/
|
143 |
-
--output_model_path models/
|
144 |
--world_size 8 --batch_size 16 --learning_rate 5e-5 \
|
145 |
--total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
|
146 |
--deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
|
@@ -149,14 +149,14 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
|
|
149 |
Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
150 |
|
151 |
```
|
152 |
-
python3 models/
|
153 |
-
models/
|
154 |
```
|
155 |
|
156 |
Finally, we convert the pre-trained model into Huggingface's format:
|
157 |
|
158 |
```
|
159 |
-
python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/
|
160 |
--output_model_path pytorch_model.bin \
|
161 |
--layers_num 48
|
162 |
```
|
|
|
112 |
--dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
|
113 |
--vocab_path models/google_zh_vocab.txt \
|
114 |
--config_path models/gpt2/xlarge_config.json \
|
115 |
+
--output_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model \
|
116 |
--world_size 8 --batch_size 64 \
|
117 |
--total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
|
118 |
--deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
|
|
|
121 |
Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
122 |
|
123 |
```
|
124 |
+
python3 models/cluecorpussmall_gpt2_xlarge_seq128_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq128_model/ \
|
125 |
+
models/cluecorpussmall_gpt2_xlarge_seq128_model.bin
|
126 |
```
|
127 |
|
128 |
Stage2:
|
|
|
139 |
--dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
|
140 |
--vocab_path models/google_zh_vocab.txt \
|
141 |
--config_path models/gpt2/xlarge_config.json \
|
142 |
+
--pretrained_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model.bin \
|
143 |
+
--output_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model \
|
144 |
--world_size 8 --batch_size 16 --learning_rate 5e-5 \
|
145 |
--total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
|
146 |
--deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
|
|
|
149 |
Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
|
150 |
|
151 |
```
|
152 |
+
python3 models/cluecorpussmall_gpt2_xlarge_seq1024_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq1024_model/ \
|
153 |
+
models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin
|
154 |
```
|
155 |
|
156 |
Finally, we convert the pre-trained model into Huggingface's format:
|
157 |
|
158 |
```
|
159 |
+
python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin \
|
160 |
--output_model_path pytorch_model.bin \
|
161 |
--layers_num 48
|
162 |
```
|