mboillet commited on
Commit
dc8331a
·
verified ·
1 Parent(s): 1b3e43b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -20
README.md CHANGED
@@ -23,7 +23,7 @@ This model performs Handwritten Text Recognition in Norwegian. It was developed
23
 
24
  ## Model description
25
 
26
- The model has been trained using the PyLaia library on the [NorHand](https://zenodo.org/record/6542056) document images.
27
  Line bounding boxes were improved using a post-processing step.
28
 
29
  Training images were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
@@ -32,33 +32,33 @@ Training images were resized with a fixed height of 128 pixels, keeping the orig
32
 
33
  The model achieves the following results:
34
 
35
- | set | CER (%) | WER (%) |
36
- | ----- | ---------: | --------: |
37
- | train | 2.33 | 5.62 |
38
- | val | 8.20 | 24.75 |
39
- | test | 7.81 | 23.30 |
 
 
 
40
 
41
- Results improve on validation and test sets when PyLaia is combined with a 6-gram language model.
42
- The language model is trained on [this text corpus](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-73/) published by the National Library of Norway.
43
-
44
- | set | CER (%) | WER (%) |
45
- | ----- | ---------: | --------: |
46
- | train | 2.62 | 6.13 |
47
- | val | 7.01 | 19.75 |
48
- | test | 6.75 | 18.22 |
49
 
50
  ## How to use?
51
 
52
- Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).
53
 
54
  # Cite us!
55
 
56
  ```bibtex
57
- @inproceedings{pylaia-lib,
58
- author = "Tarride, Solène and Schneider, Yoann and Generali, Marie and Boillet, Melodie and Abadie, Bastien and Kermorvant, Christopher",
59
- title = "Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library",
60
- booktitle = "Submitted at ICDAR2024",
61
- year = "2024"
 
 
 
 
62
  }
63
  ```
64
 
 
23
 
24
  ## Model description
25
 
26
+ The model has been trained using the PyLaia library on the [NorHand](https://zenodo.org/record/6542056) dataset.
27
  Line bounding boxes were improved using a post-processing step.
28
 
29
  Training images were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
 
32
 
33
  The model achieves the following results:
34
 
35
+ | set | Language model | CER (%) | WER (%) |
36
+ |:----- |:-------------- | -------:| -------:|
37
+ | train | no | 2.33 | 5.62 |
38
+ | train | yes | 2.62 | 6.13 |
39
+ | val | no | 8.20 | 24.75 |
40
+ | val | yes | 7.01 | 19.75 |
41
+ | test | no | 7.81 | 23.30 |
42
+ | test | yes | 6.75 | 18.22 |
43
 
44
+ An external 6-gram character language model can be used to improve recognition. The language model is trained on [this text corpus](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-73/) published by the National Library of Norway.
 
 
 
 
 
 
 
45
 
46
  ## How to use?
47
 
48
+ Please refer to the [PyLaia documentation](https://atr.pages.teklia.com/pylaia/usage/prediction/) to use this model.
49
 
50
  # Cite us!
51
 
52
  ```bibtex
53
+ @inproceedings{pylaia2024,
54
+ author = {Tarride, Solène and Schneider, Yoann and Generali-Lince, Marie and Boillet, Mélodie and Abadie, Bastien and Kermorvant, Christopher},
55
+ title = {{Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library}},
56
+ booktitle = {Document Analysis and Recognition - ICDAR 2024},
57
+ year = {2024},
58
+ publisher = {Springer Nature Switzerland},
59
+ address = {Cham},
60
+ pages = {387--404},
61
+ isbn = {978-3-031-70549-6}
62
  }
63
  ```
64