Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
|
3 |
-
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of
|
4 |
|
5 |
The weights of the model were initialized by pruning the weights of bert-base-uncased.
|
6 |
|
|
|
1 |
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
|
3 |
+
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 256 (a third of the hidden size of BERT) and 4 attention heads (hence the same head size of BERT).
|
4 |
|
5 |
The weights of the model were initialized by pruning the weights of bert-base-uncased.
|
6 |
|