File size: 2,412 Bytes
ea0eab8
 
 
 
 
acda830
 
c20a556
 
 
 
 
 
acda830
c20a556
 
4113a62
c20a556
 
 
acda830
 
91b387f
acda830
91b387f
acda830
91b387f
acda830
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f0d907
 
 
 
 
 
 
 
 
534a3d6
acda830
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: mit
base_model:
- magistermilitum/tridis_HTR
library_name: transformers
language:
- la
tags:
- trOCR
- HTR
- text_recognition
- history
- latin
---
<h1>Textrecognition Model for Essoins (England) in Latin</h1>

Part of the developments within the [Flow-Project](https://www.flow-project.net/).
Developed by Jonas Widmer, Christopher Kuhlmann, and Melvin Wilde.


Base model: **magistermilitum/tridis_HTR v1**

Train Lines: 15356

Eval Lines: 394

Test Lines: 2995


Epochs: 14.1667 / 20

Eval CER: 0.0544

Test CER: 0.0622


Testresults with CERberus
| Metric                     | Value   |
|----------------------------|---------|
| Character Error Rate       | 6.22    |
| Number of Correct Characters| 186998 |
| Number of Substitutions    | 5425    |
| Number of Insertions       | 2933    |
| Number of Deletions        | 3849    |
| Total Character Count      | 196272  |
| Original Lines Count       | 2288    |
| Discarded Lines Count      | 0       |

| Block                        | Count   | Correct   | Incorrect   | Correct Ratio   | Incorrect Ratio   |
|------------------------------|---------|-----------|-------------|-----------------|-------------------|
| Digits                      | 0       | 0         | 0           | nan             | nan               |
| Lowercase Latin alphabet    | 154731  | 147241    | 7490        | 95.16           | 4.84              |
| MUFI Glyphs                 | 0       | 0         | 0           | nan             | nan               |
| Punctuation                 | 9       | 4         | 5           | 44.44           | 55.56             |
| Uppercase Latin alphabet    | 6883    | 6450      | 433         | 93.71           | 6.29              |


The handwritten texts in Latin (with some Middle-English and Anglo-Norman wording) that were used for training are from the 13th and 14th centuries. They come from England and were written in 'Court Hand', also known as 'Anglicana'. They come from the 'Court of Common Pleas', the second highest court of the time, and deal primarily with civil disputes, such as inheritances or dowries, and from the Justices, which also dealt with civil pleas, but covered crown pleas as well. 

The model has not been extensively tested.

Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.

Potential biases are still to be identified.