Michielo commited on
Commit
a77a2fe
·
verified ·
1 Parent(s): 87fca81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -12
README.md CHANGED
@@ -6,11 +6,13 @@ datasets:
6
  language:
7
  - en
8
  - nl
9
- metrics:
10
- - sacrebleu
11
  pipeline_tag: text2text-generation
12
  tags:
13
  - translation
 
 
 
 
14
  widget:
15
  - text: ">>en<< Was het leuk?"
16
  ---
@@ -36,24 +38,27 @@ You can use the following code for model inference. This model was finetuned to
36
  ```Python
37
  from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
38
 
 
39
  tokenizer = AutoTokenizer.from_pretrained("Michielo/mt5-small_nl-en_translation")
40
  model = AutoModelForSeq2SeqLM.from_pretrained("Michielo/mt5-small_nl-en_translation")
41
 
42
- translation_generation_config = GenerationConfig(
43
- num_beams=4,
44
- early_stopping=True,
45
- decoder_start_token_id=0,
46
- eos_token_id=model.config.eos_token_id,
47
- pad_token=model.config.pad_token_id,
48
- )
49
-
50
- translation_generation_config.save_pretrained("/tmp", "translation_generation_config.json")
51
- generation_config = GenerationConfig.from_pretrained("/tmp", "translation_generation_config.json")
52
  inputs = tokenizer(">>en<< Your dutch text here", return_tensors="pt")
 
53
  outputs = model.generate(**inputs, generation_config=generation_config)
 
54
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
55
  ```
56
 
57
 
 
 
 
 
 
 
 
 
 
58
  ## License
59
  This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
 
6
  language:
7
  - en
8
  - nl
 
 
9
  pipeline_tag: text2text-generation
10
  tags:
11
  - translation
12
+ metrics:
13
+ - bleu
14
+ - chrf
15
+ - chrf++
16
  widget:
17
  - text: ">>en<< Was het leuk?"
18
  ---
 
38
  ```Python
39
  from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
40
 
41
+ # load tokenizer and model
42
  tokenizer = AutoTokenizer.from_pretrained("Michielo/mt5-small_nl-en_translation")
43
  model = AutoModelForSeq2SeqLM.from_pretrained("Michielo/mt5-small_nl-en_translation")
44
 
45
+ # tokenize input
 
 
 
 
 
 
 
 
 
46
  inputs = tokenizer(">>en<< Your dutch text here", return_tensors="pt")
47
+ # calculate the output
48
  outputs = model.generate(**inputs, generation_config=generation_config)
49
+ # decode and print
50
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
51
  ```
52
 
53
 
54
+ ## Benchmarks
55
+ | Benchmark | Score |
56
+ |--------------|:-----:|
57
+ | BLEU | 51.92% |
58
+ | chr-F | 67.90% |
59
+ | chr-F++ | 67.62% |
60
+
61
+
62
+
63
  ## License
64
  This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.