A model for annotating entries in biographical dictionaries using Wikidata entities. Based on Google's mT5.

Example input text:

Anschiringer, Anton, Publizist, * 1812 Wien, † 17. 12. 1873 Reichenberg (Liberec). Erzieher im Hause des Großindustriellen...

Example output text:

{{WD|label|Anschiringer, Anton}}, {{WD|P106|Q6051619|Publizist}}, * {{WD|P569|1812}} {{WD|P19|Q1741|Wien}}, † {{WD|P570|1873-12-17|17. 12. 1873}} {{WD|P20|Q146351|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...

Evaluation

After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of 0.3878 for this model.

More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.

Basic statements Qualifier statements Total
Ground truth 1,209 572 1,781
Valid statements by the model 714 120 834
Accuracy 0.5906 0.2098 0.4683
Loss 0.4094 0.7902 0.5317

In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.

Acknowledgement

The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the Wikimedia Research Fund.

Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Downloads last month
4
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daelba/biography2wikidata

Base model

google/mt5-small
Finetuned
(527)
this model