INT8 DistilBart finetuned on CNN DailyMail
Post-training dynamic quantization
This is an INT8 PyTorch model quantized with huggingface/optimum-intel through the usage of Intel® Neural Compressor.
The original fp32 model comes from the fine-tuned model sshleifer/distilbart-cnn-12-6.
Below linear modules (21/133) are fallbacked to fp32 for less than 1% relative accuracy loss:
'model.decoder.layers.2.fc2', 'model.encoder.layers.11.fc2', 'model.decoder.layers.1.fc2', 'model.decoder.layers.0.fc2', 'model.decoder.layers.4.fc1', 'model.decoder.layers.3.fc2', 'model.encoder.layers.8.fc2', 'model.decoder.layers.3.fc1', 'model.encoder.layers.11.fc1', 'model.encoder.layers.0.fc2', 'model.encoder.layers.3.fc1', 'model.encoder.layers.10.fc2', 'model.decoder.layers.5.fc1', 'model.encoder.layers.1.fc2', 'model.encoder.layers.3.fc2', 'lm_head', 'model.encoder.layers.7.fc2', 'model.decoder.layers.0.fc1', 'model.encoder.layers.4.fc1', 'model.encoder.layers.10.fc1', 'model.encoder.layers.6.fc1'
Evaluation result
INT8 | FP32 | |
---|---|---|
Accuracy (eval-rougeLsum) | 41.4707 | 41.8117 |
Model size | 722M | 1249M |
Load with optimum:
# transformers <= 4.23.0
from optimum.intel import INCModelForSeq2SeqLM
model_id = "Intel/distilbart-cnn-12-6-int8-dynamic"
int8_model = INCModelForSeq2SeqLM.from_pretrained(model_id)
- Downloads last month
- 23