A model for annotating entries in biographical dictionaries using Wikidata entities. Based on Google's mT5.
Example input text:
Anschiringer, Anton, Publizist, * 1812 Wien, † 17. 12. 1873 Reichenberg (Liberec). Erzieher im Hause des Großindustriellen...
Example output text:
{{WD|label|Anschiringer, Anton}}, {{WD|P106|Q6051619|Publizist}}, * {{WD|P569|1812}} {{WD|P19|Q1741|Wien}}, † {{WD|P570|1873-12-17|17. 12. 1873}} {{WD|P20|Q146351|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...
Evaluation
After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of 0.3878 for this model.
More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.
Basic statements | Qualifier statements | Total | |
---|---|---|---|
Ground truth | 1,209 | 572 | 1,781 |
Valid statements by the model | 714 | 120 | 834 |
Accuracy | 0.5906 | 0.2098 | 0.4683 |
Loss | 0.4094 | 0.7902 | 0.5317 |
In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.
Acknowledgement
The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the Wikimedia Research Fund.
Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
- Downloads last month
- 4
Model tree for daelba/biography2wikidata
Base model
google/mt5-small