Gulbert-ft-ita

This model can be used for multi-label classification of Italian legislative acts, according to the subject index (taxonomy) currently adopted in the Gazzetta Uffciale. The model has been obtained by fine-tuning a BERT-XXL Italian model on a large corpus of legislative acts published in the Gazzetta Ufficiale from 1988 until early 2022.

Model Details

Model Description

Language(s) (NLP): Italian
License: apache-2.0
Finetuned from model: https://huggingface.co/dbmdz/bert-base-italian-xxl-uncased

Model Sources

Repository: https://huggingface.co/dhfbk
Paper: M. Rovera, A. Palmero Aprosio, F. Greco, M. Lucchese, S. Tonelli and A. Antetomaso (2023) Italian Legislative Text Classification for Gazzetta Ufficiale. In Proceedings of the Fifth Natural Legal Language Workshop (NLLP2023).
Demo: https://dh-server.fbk.eu/ipzs-ui-demo/

Uses

Direct Use

Multi-label text classification of Italian legislative acts.

Training Details

Training Data

The dataset used for training the model can be retrieved at our GitHub account and is further documented in the above mentioned paper.

Evaluation

Results

The model achieves a micro-F1 score of 0.873, macro-F1 of 0.471 and a weighted-F1 of 0.864 on the test set (3-fold average).

Citation

BibTeX:

@inproceedings{rovera-etal-2023-italian,
    title = "{I}talian Legislative Text Classification for Gazzetta Ufficiale",
    author = "Rovera, Marco  and
      Palmero Aprosio, Alessio  and
      Greco, Francesco  and
      Lucchese, Mariano  and
      Tonelli, Sara  and
      Antetomaso, Antonio",
    booktitle = "Proceedings of the Natural Legal Language Processing Workshop 2023",
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.nllp-1.6",
    pages = "44--50"
}