eli4s commited on
Commit
1e2fff5
·
1 Parent(s): 2d56f0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
- The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).
4
 
5
  The weights of the model were initialized by pruning the weights of bert-base-uncased.
6
 
 
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
+ The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 256 (a third of the hidden size of BERT) and 4 attention heads (hence the same head size of BERT).
4
 
5
  The weights of the model were initialized by pruning the weights of bert-base-uncased.
6