ku-nlp
/

gpt2-small-japanese-char

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

murawaki commited on Apr 19, 2023

Commit

4e1b05b

•

1 Parent(s): afbacac

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ widget:
 ## Model description
-This is a Japanese character-level GPT-2 Small language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
 ## How to use
@@ -65,7 +65,7 @@ The following hyperparameters were used during pre-training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
 - weight_decay: 0.01
 - max_grad_norm: 1.0
-- max_steps: 500,000
 - warmup_steps: 10,000
 The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.

 ## Model description
+This is a Japanese character-level GPT-2 Small (90M parameters) language model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
 ## How to use
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
 - weight_decay: 0.01
 - max_grad_norm: 1.0
+- max_steps: 500,000 (but terminated at *** steps)
 - warmup_steps: 10,000
 The eval loss was 1.60 while the eval accuracy was 0.635. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora.