uer commited on
Commit
c485313
1 Parent(s): 459381f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -29
README.md CHANGED
@@ -40,53 +40,53 @@ The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tence
40
  Stage1:
41
 
42
  ```
43
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
44
- --vocab_path models/google_zh_vocab.txt \\
45
- --dataset_path cluecorpussmall_lm_seq128_dataset.pt \\
46
  --seq_length 128 --processes_num 32 --target lm
47
  ```
48
 
49
  ```
50
- python3 pretrain.py --dataset_path cluecorpussmall_lm_seq128_dataset.pt \\
51
- --vocab_path models/google_zh_vocab.txt \\
52
- --config_path models/gpt2/distil_config.json \\
53
- --output_model_path models/cluecorpussmall_gpt2_distil_seq128_model.bin \\
54
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
55
- --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \\
56
- --learning_rate 1e-4 --batch_size 64 \\
57
- --embedding word_pos --remove_embedding_layernorm \\
58
- --encoder transformer --mask causal --layernorm_positioning pre \\
59
- --target lm --tie_weight
60
  ```
61
 
62
  Stage2:
63
 
64
  ```
65
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
66
- --vocab_path models/google_zh_vocab.txt \\
67
- --dataset_path cluecorpussmall_lm_seq1024_dataset.pt \\
68
  --seq_length 1024 --processes_num 32 --target lm
69
  ```
70
 
71
  ```
72
- python3 pretrain.py --dataset_path cluecorpussmall_lm_seq1024_dataset.pt \\
73
- --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq128_model.bin-1000000 \\
74
- --vocab_path models/google_zh_vocab.txt \\
75
- --config_path models/gpt2/distil_config.json \\
76
- --output_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \\
77
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
78
- --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \\
79
- --learning_rate 5e-5 --batch_size 16 \\
80
- --embedding word_pos --remove_embedding_layernorm \\
81
- --encoder transformer --mask causal --layernorm_positioning pre \\
82
- --target lm --tie_weight
83
  ```
84
 
85
  Finally, we convert the pre-trained model into Huggingface's format:
86
 
87
  ```
88
- python3 scripts/convert_gpt2_from_uer_to_huggingface.py --input_model_path cluecorpussmall_gpt2_distil_seq1024_model.bin-250000 \\
89
- --output_model_path pytorch_model.bin \\
90
  --layers_num 6
91
  ```
92
 
 
40
  Stage1:
41
 
42
  ```
43
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
44
+ --vocab_path models/google_zh_vocab.txt \
45
+ --dataset_path cluecorpussmall_lm_seq128_dataset.pt \
46
  --seq_length 128 --processes_num 32 --target lm
47
  ```
48
 
49
  ```
50
+ python3 pretrain.py --dataset_path cluecorpussmall_lm_seq128_dataset.pt \
51
+ --vocab_path models/google_zh_vocab.txt \
52
+ --config_path models/gpt2/distil_config.json \
53
+ --output_model_path models/cluecorpussmall_gpt2_distil_seq128_model.bin \
54
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
55
+ --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
56
+ --learning_rate 1e-4 --batch_size 64 \
57
+ --embedding word_pos --remove_embedding_layernorm \
58
+ --encoder transformer --mask causal --layernorm_positioning pre \
59
+ --target lm --tie_weights
60
  ```
61
 
62
  Stage2:
63
 
64
  ```
65
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
66
+ --vocab_path models/google_zh_vocab.txt \
67
+ --dataset_path cluecorpussmall_lm_seq1024_dataset.pt \
68
  --seq_length 1024 --processes_num 32 --target lm
69
  ```
70
 
71
  ```
72
+ python3 pretrain.py --dataset_path cluecorpussmall_lm_seq1024_dataset.pt \
73
+ --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq128_model.bin-1000000 \
74
+ --vocab_path models/google_zh_vocab.txt \
75
+ --config_path models/gpt2/distil_config.json \
76
+ --output_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
77
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
78
+ --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
79
+ --learning_rate 5e-5 --batch_size 16 \
80
+ --embedding word_pos --remove_embedding_layernorm \
81
+ --encoder transformer --mask causal --layernorm_positioning pre \
82
+ --target lm --tie_weights
83
  ```
84
 
85
  Finally, we convert the pre-trained model into Huggingface's format:
86
 
87
  ```
88
+ python3 scripts/convert_gpt2_from_uer_to_huggingface.py --input_model_path cluecorpussmall_gpt2_distil_seq1024_model.bin-250000 \
89
+ --output_model_path pytorch_model.bin \
90
  --layers_num 6
91
  ```
92