murawaki commited on
Commit
4e1b05b
1 Parent(s): afbacac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ widget:
16
 
17
  ## Model description
18
 
19
- This is a Japanese character-level GPT-2 Small language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
20
 
21
  ## How to use
22
 
@@ -65,7 +65,7 @@ The following hyperparameters were used during pre-training:
65
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
66
  - weight_decay: 0.01
67
  - max_grad_norm: 1.0
68
- - max_steps: 500,000
69
  - warmup_steps: 10,000
70
 
71
  The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.
 
16
 
17
  ## Model description
18
 
19
+ This is a Japanese character-level GPT-2 Small (90M parameters) language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
20
 
21
  ## How to use
22
 
 
65
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
66
  - weight_decay: 0.01
67
  - max_grad_norm: 1.0
68
+ - max_steps: 500,000 (but terminated at *** steps)
69
  - warmup_steps: 10,000
70
 
71
  The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.