More information about the model
#2
by
albalbalba
- opened
I am using this model (among many others) for a research regarding LMs morpho-syntactic abilities and I would like to know a bit more about this model. More specifically:
- What process was followed for the distillation of this model. Was it the same distillation as described in the original DistilBERT paper (https://arxiv.org/pdf/1910.01108), or was it something different (like the language reduction process from mBERT described here https://aclanthology.org/2020.sustainlp-1.16.pdf)?
- If it was a true distillation process, what was the training data that was used?
Thank you!!