izumilab commited on
Commit
28bcf37
·
1 Parent(s): 6c31976

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -22,7 +22,7 @@ The codes for the pretraining are available at [retarfi/language-pretraining](ht
22
 
23
  ## Model architecture
24
 
25
- %The model architecture is the same as ELECTRA small in the [original ELECTRA implementation](https://github.com/google-research/electra); 12 layers, 256 dimensions of hidden states, and 4 attention heads.
26
 
27
  ## Training Data
28
 
@@ -40,7 +40,7 @@ The vocabulary size is 32768.
40
 
41
  ## Training
42
 
43
- The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555); 128 tokens per instance, 128 instances per batch, and 1M training steps.
44
 
45
  The size of the generator is the same of the discriminator.
46
 
 
22
 
23
  ## Model architecture
24
 
25
+ The model architecture is the same as ELECTRA small in the [original ELECTRA implementation](https://github.com/google-research/electra); 12 layers, 256 dimensions of hidden states, and 4 attention heads.
26
 
27
  ## Training Data
28
 
 
40
 
41
  ## Training
42
 
43
+ The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555) except size; 128 tokens per instance, 128 instances per batch, and 1M training steps.
44
 
45
  The size of the generator is the same of the discriminator.
46