Michielo
/

mt5-small_nl-en_translation

Text Generation

text2text-generation

Model card Files Files and versions Community

Michielo commited on Apr 9, 2024

Commit

a77a2fe

·

verified ·

1 Parent(s): 87fca81

Update README.md

Files changed (1) hide show

README.md +17 -12

README.md CHANGED Viewed

@@ -6,11 +6,13 @@ datasets:
 language:
 - en
 - nl
-metrics:
-- sacrebleu
 pipeline_tag: text2text-generation
 tags:
 - translation
 widget:
 - text: ">>en<< Was het leuk?"
 ---
@@ -36,24 +38,27 @@ You can use the following code for model inference. This model was finetuned to
 ```Python
 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
 tokenizer = AutoTokenizer.from_pretrained("Michielo/mt5-small_nl-en_translation")
 model = AutoModelForSeq2SeqLM.from_pretrained("Michielo/mt5-small_nl-en_translation")
-translation_generation_config = GenerationConfig(
-    num_beams=4,
-    early_stopping=True,
-    decoder_start_token_id=0,
-    eos_token_id=model.config.eos_token_id,
-    pad_token=model.config.pad_token_id,
-)
-translation_generation_config.save_pretrained("/tmp", "translation_generation_config.json")
-generation_config = GenerationConfig.from_pretrained("/tmp", "translation_generation_config.json")
 inputs = tokenizer(">>en<< Your dutch text here", return_tensors="pt")
 outputs = model.generate(**inputs, generation_config=generation_config)
 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 ## License
 This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.

 language:
 - en
 - nl
 pipeline_tag: text2text-generation
 tags:
 - translation
+metrics:
+- bleu
+- chrf
+- chrf++
 widget:
 - text: ">>en<< Was het leuk?"
 ---
 ```Python
 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
+# load tokenizer and model
 tokenizer = AutoTokenizer.from_pretrained("Michielo/mt5-small_nl-en_translation")
 model = AutoModelForSeq2SeqLM.from_pretrained("Michielo/mt5-small_nl-en_translation")
+# tokenize input
 inputs = tokenizer(">>en<< Your dutch text here", return_tensors="pt")
+# calculate the output
 outputs = model.generate(**inputs, generation_config=generation_config)
+# decode and print
 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
+## Benchmarks
+| Benchmark    | Score |
+|--------------|:-----:|
+| BLEU    | 51.92% |
+| chr-F | 67.90% |
+| chr-F++   | 67.62% |
 ## License
 This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.