Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ widget:
|
|
16 |
|
17 |
## Model description
|
18 |
|
19 |
-
This is a Japanese character-level GPT-2 Small language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
|
20 |
|
21 |
## How to use
|
22 |
|
@@ -65,7 +65,7 @@ The following hyperparameters were used during pre-training:
|
|
65 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
|
66 |
- weight_decay: 0.01
|
67 |
- max_grad_norm: 1.0
|
68 |
-
- max_steps: 500,000
|
69 |
- warmup_steps: 10,000
|
70 |
|
71 |
The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.
|
|
|
16 |
|
17 |
## Model description
|
18 |
|
19 |
+
This is a Japanese character-level GPT-2 Small (90M parameters) language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
|
20 |
|
21 |
## How to use
|
22 |
|
|
|
65 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
|
66 |
- weight_decay: 0.01
|
67 |
- max_grad_norm: 1.0
|
68 |
+
- max_steps: 500,000 (but terminated at *** steps)
|
69 |
- warmup_steps: 10,000
|
70 |
|
71 |
The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.
|