izumi-lab
/

electra-small-japanese-generator

Model card Files Files and versions Community

izumilab commited on Oct 8, 2021

Commit

28bcf37

·

1 Parent(s): 6c31976

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ The codes for the pretraining are available at [retarfi/language-pretraining](ht
 ## Model architecture
-%The model architecture is the same as ELECTRA small in the [original ELECTRA implementation](https://github.com/google-research/electra); 12 layers, 256 dimensions of hidden states, and 4 attention heads.
 ## Training Data
@@ -40,7 +40,7 @@ The vocabulary size is 32768.
 ## Training
-The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555); 128 tokens per instance, 128 instances per batch, and 1M training steps.
 The size of the generator is the same of the discriminator.

 ## Model architecture
+The model architecture is the same as ELECTRA small in the [original ELECTRA implementation](https://github.com/google-research/electra); 12 layers, 256 dimensions of hidden states, and 4 attention heads.
 ## Training Data
 ## Training
+The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555) except size; 128 tokens per instance, 128 instances per batch, and 1M training steps.
 The size of the generator is the same of the discriminator.