MelvinW's picture
Update README.md
73aafed verified
|
raw
history blame
1.89 kB
metadata
license: mit
base_model:
  - magistermilitum/tridis_HTR
library_name: transformers
language:
  - la

Base model: magistermilitum/tridis_HTR v1

Train Lines: 15356

Eval Lines: 394

Test Lines: 2995

Epochs: 14.1667 / 20

Eval CER: 0.0544

Test CER: 0.0622

Testresults with CERberus

Metric Value
Character Error Rate 6.22
Number of Correct Characters 186998
Number of Substitutions 5425
Number of Insertions 2933
Number of Deletions 3849
Total Character Count 196272
Original Lines Count 2288
Discarded Lines Count 0
Block Count Correct Incorrect Correct Ratio Incorrect Ratio
Digits 0 0 0 nan nan
Lowercase Latin alphabet 154731 147241 7490 95.16 4.84
MUFI Glyphs 0 0 0 nan nan
Punctuation 9 4 5 44.44 55.56
Uppercase Latin alphabet 6883 6450 433 93.71 6.29

Finetuned on an Anglicana-dataset, with mainly Middle Latin and few Middle English and Anglo-Norman text sources containing documents from:

  • the Common Pleas (CP)
  • the Justices (JUST)

from the English Legal Court Rolls.

The model has not been extensively tested.

Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.

Potential biases are still to be identified.