DistilRoBERTa-base-ca

Model description

This model is a distilled version of projecte-aina/roberta-base-ca-v2.

It follows the same training procedure as DistilBERT, using the implementation of Knowledge Distillation from the paper's official repository.

The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads. This adds up to a total of 82M parameters, which is considerably less than the 125M of standard RoBERTa-base models. This makes the model lighter and faster than the original, at the cost of a slightly lower performance.

Training

Training procedure

This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.

It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).

So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.

Training data

The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:

Corpus Size (GB)
Catalan Crawling 13.00
RacoCatalá 8.10
Catalan Oscar 4.00
CaWaC 3.60
Cat. General Crawling 2.50
Wikipedia 1.10
DOGC 0.78
Padicat 0.63
ACN 0.42
Nació Digital 0.42
Cat. Government Crawling 0.24
Vilaweb 0.06
Catalan Open Subtitles 0.02
Tweets 0.02

Evaluation

Evaluation benchmark

This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB), which includes the following datasets:

Dataset Task Total Train Dev Test
AnCora NER 13,581 10,628 1,427 1,526
AnCora POS 16,678 13,123 1,709 1,846
STS-ca STS 3,073 2,073 500 500
TeCla TC 137,775 110,203 13,786 13,786
TE-ca RTE 21,163 16,930 2,116 2,117
CatalanQA QA 21,427 17,135 2,157 2,135
XQuAD-ca QA - - - 1,189

Evaluation results

This is how it compares to its teacher when fine-tuned on the aforementioned downstream tasks:

Model \ Task NER (F1) POS (F1) STS-ca (Comb.) TeCla (Acc.) TEca (Acc.) CatalanQA (F1/EM) XQuAD-ca 1 (F1/EM)
RoBERTa-base-ca-v2 89.29 98.96 79.07 74.26 83.14 89.50/76.63 73.64/55.42
DistilRoBERTa-base-ca 87.88 98.83 77.26 73.20 76.00 84.07/70.77 62.93/45.08

1 : Trained on CatalanQA, tested on XQuAD-ca (no train set).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .