Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
+
|
3 |
+
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
|
4 |
+
|
5 |
+
The knowledge distillation was performed using multiple loss functions.
|
6 |
+
|
7 |
+
The weights of the model were initialized from scratch.
|
8 |
+
|
9 |
+
PS : the tokenizer is the same as the one of the model bert-base-uncased.
|
10 |
+
|