ml6team
/

byt5-base-dutch-ocr-correction

Text2Text Generation

text-generation-inference

Model card Files Files and versions Community

simondg commited on Jul 28, 2021

Commit

c86f565

·

1 Parent(s): af74633

update README

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# ByT5 Dutch OCR correction
 This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The [google/byt5-base](https://huggingface.co/google/byt5-base) model is finetuned on the dutch section of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.
@@ -8,13 +8,13 @@ This model is a finetuned byT5 model that corrects OCR mistakes found in dutch s
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
-example_sentence = "Een algoritme dat op basis van kunstmatige inte11i9entie vkijwe1 geautomatiseerd een Nederlandstalige tekst samenstelt."
-tokenizer = AutoTokenizer.from_pretrained('ml6team/byt5-small-dutch-ocr-correction')
 model_inputs = tokenizer(example_sentence, max_length=128, truncation=True, return_tensors="pt")
-model = T5ForConditionalGeneration.from_pretrained('ml6team/byt5-small-dutch-ocr-correction')
 outputs = model.generate(**model_inputs, max_length=128)
 tokenizer.decode(outputs[0])

+# ByT5 Dutch OCR Correction
 This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The [google/byt5-base](https://huggingface.co/google/byt5-base) model is finetuned on the dutch section of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
+example_sentence = "Ben algoritme dat op ba8i8 van kunstmatige inte11i9entie vkijwel geautomatiseerd een tekst herstelt met OCR fuuten."
+tokenizer = AutoTokenizer.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')
 model_inputs = tokenizer(example_sentence, max_length=128, truncation=True, return_tensors="pt")
+model = T5ForConditionalGeneration.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')
 outputs = model.generate(**model_inputs, max_length=128)
 tokenizer.decode(outputs[0])