eli4s
/

Bert-L12-h240-A12

Model card Files Files and versions Community

eli4s commited on Jul 16, 2021

Commit

1ce928b

·

1 Parent(s): e8377d3

Create README.md

Files changed (1) hide show

README.md +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+This model was pretrained on the bookcorpus dataset using knowledge distillation.
+The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
+The knowledge distillation was performed using multiple loss functions.
+The weights of the model were initialized from scratch.
+PS : the tokenizer is the same as the one of the model bert-base-uncased.