File size: 1,889 Bytes
ea0eab8
 
 
 
 
acda830
 
 
 
 
73aafed
acda830
73aafed
acda830
73aafed
acda830
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f0d907
 
 
 
 
 
 
 
 
acda830
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
base_model:
- magistermilitum/tridis_HTR
library_name: transformers
language:
- la
---
Base model: **magistermilitum/tridis_HTR v1**

Train Lines: 15356

Eval Lines: 394

Test Lines: 2995


Epochs: 14.1667 / 20

Eval CER: 0.0544

Test CER: 0.0622


Testresults with CERberus
| Metric                     | Value   |
|----------------------------|---------|
| Character Error Rate       | 6.22    |
| Number of Correct Characters| 186998 |
| Number of Substitutions    | 5425    |
| Number of Insertions       | 2933    |
| Number of Deletions        | 3849    |
| Total Character Count      | 196272  |
| Original Lines Count       | 2288    |
| Discarded Lines Count      | 0       |

| Block                        | Count   | Correct   | Incorrect   | Correct Ratio   | Incorrect Ratio   |
|------------------------------|---------|-----------|-------------|-----------------|-------------------|
| Digits                      | 0       | 0         | 0           | nan             | nan               |
| Lowercase Latin alphabet    | 154731  | 147241    | 7490        | 95.16           | 4.84              |
| MUFI Glyphs                 | 0       | 0         | 0           | nan             | nan               |
| Punctuation                 | 9       | 4         | 5           | 44.44           | 55.56             |
| Uppercase Latin alphabet    | 6883    | 6450      | 433         | 93.71           | 6.29              |



Finetuned on an Anglicana-dataset, with mainly Middle Latin and few Middle English and Anglo-Norman text sources containing documents from:

  - the Common Pleas (CP)
  - the Justices (JUST)

from the English Legal Court Rolls.

The model has not been extensively tested.

Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.

Potential biases are still to be identified.