File size: 1,889 Bytes
ea0eab8 acda830 73aafed acda830 73aafed acda830 73aafed acda830 4f0d907 acda830 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
license: mit
base_model:
- magistermilitum/tridis_HTR
library_name: transformers
language:
- la
---
Base model: **magistermilitum/tridis_HTR v1**
Train Lines: 15356
Eval Lines: 394
Test Lines: 2995
Epochs: 14.1667 / 20
Eval CER: 0.0544
Test CER: 0.0622
Testresults with CERberus
| Metric | Value |
|----------------------------|---------|
| Character Error Rate | 6.22 |
| Number of Correct Characters| 186998 |
| Number of Substitutions | 5425 |
| Number of Insertions | 2933 |
| Number of Deletions | 3849 |
| Total Character Count | 196272 |
| Original Lines Count | 2288 |
| Discarded Lines Count | 0 |
| Block | Count | Correct | Incorrect | Correct Ratio | Incorrect Ratio |
|------------------------------|---------|-----------|-------------|-----------------|-------------------|
| Digits | 0 | 0 | 0 | nan | nan |
| Lowercase Latin alphabet | 154731 | 147241 | 7490 | 95.16 | 4.84 |
| MUFI Glyphs | 0 | 0 | 0 | nan | nan |
| Punctuation | 9 | 4 | 5 | 44.44 | 55.56 |
| Uppercase Latin alphabet | 6883 | 6450 | 433 | 93.71 | 6.29 |
Finetuned on an Anglicana-dataset, with mainly Middle Latin and few Middle English and Anglo-Norman text sources containing documents from:
- the Common Pleas (CP)
- the Justices (JUST)
from the English Legal Court Rolls.
The model has not been extensively tested.
Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.
Potential biases are still to be identified. |