starride-teklia commited on
Commit
21ab0a3
·
verified ·
1 Parent(s): 0427abc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -16
README.md CHANGED
@@ -18,28 +18,26 @@ This model performs Handwritten Text Recognition in Norwegian. It was developed
18
 
19
  ## Model description
20
 
21
- The model has been trained using the PyLaia library on the [NorHand](https://zenodo.org/record/6542056) document images.
 
22
  Training images were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
23
 
24
- ## Evaluation results
 
 
 
 
25
 
26
- The model achieves the following results:
27
-
28
- | set | CER (%) | WER (%) |
29
- | ----- | ---------- | --------- |
30
- | train | 2.17 | 7.65 |
31
- | val | 8.78 | 24.93 |
32
- | test | 7.94 | 24.04 |
33
 
 
34
 
35
- Results improve on validation and test sets when PyLaia is combined with a 6-gram language model.
36
- The language model is trained on [this text corpus](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-73/) published by the National Library of Norway.
37
 
38
- | set | CER (%) | WER (%) |
39
- | ----- | ---------- | --------- |
40
- | train | 2.40 | 8.10 |
41
- | val | 7.45 | 19.75 |
42
- | test | 6.55 | 18.2 |
43
 
44
 
45
  ## How to use
@@ -48,6 +46,15 @@ Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).
48
 
49
  # Cite us!
50
 
 
 
 
 
 
 
 
 
 
51
  ```bibtex
52
  @inproceedings{10.1007/978-3-031-06555-2_27,
53
  author = {Maarand, Martin and Beyer, Yngvil and K\r{a}sen, Andre and Fosseide, Knut T. and Kermorvant, Christopher},
 
18
 
19
  ## Model description
20
 
21
+ The model has been trained using the PyLaia library on the [NorHand v1](https://zenodo.org/record/6542056) document images.
22
+
23
  Training images were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
24
 
25
+ | split | N horizontal lines |
26
+ | ----- | ------: |
27
+ | train | 19,653 |
28
+ | val | 2,286 |
29
+ | test | 1,793 |
30
 
31
+ An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the NorHand v1 training set.
 
 
 
 
 
 
32
 
33
+ ## Evaluation results
34
 
35
+ The model achieves the following results:
 
36
 
37
+ | set | Language model | CER (%) | WER (%) | N lines |
38
+ |:------|:---------------|:----------:|:-------:|----------:|
39
+ | test | no | 7.94 | 24.04 | 1,793 |
40
+ | test | yes | 6.55 | 18.20 | 1,793 |
 
41
 
42
 
43
  ## How to use
 
46
 
47
  # Cite us!
48
 
49
+ ```bibtex
50
+ @inproceedings{pylaia-lib,
51
+ author = "Tarride, Solène and Schneider, Yoann and Generali, Marie and Boillet, Melodie and Abadie, Bastien and Kermorvant, Christopher",
52
+ title = "Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library",
53
+ booktitle = "Submitted at ICDAR2024",
54
+ year = "2024"
55
+ }
56
+ ```
57
+
58
  ```bibtex
59
  @inproceedings{10.1007/978-3-031-06555-2_27,
60
  author = {Maarand, Martin and Beyer, Yngvil and K\r{a}sen, Andre and Fosseide, Knut T. and Kermorvant, Christopher},