mongrz commited on
Commit
8cc50fb
·
verified ·
1 Parent(s): 1614c87

Add model, config, and tokenizer

Browse files
Files changed (1) hide show
  1. README.md +16 -11
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  library_name: transformers
3
  license: mit
4
- base_model: facebook/m2m100_418M
5
  tags:
6
  - generated_from_trainer
7
  metrics:
@@ -16,11 +16,11 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # model_output
18
 
19
- This model is a fine-tuned version of [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 3.2806
22
- - Bleu: 89.9171
23
- - Gen Len: 35.906
24
 
25
  ## Model description
26
 
@@ -40,23 +40,28 @@ More information needed
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 2e-05
43
- - train_batch_size: 16
44
- - eval_batch_size: 16
45
  - seed: 42
46
  - gradient_accumulation_steps: 4
47
- - total_train_batch_size: 64
48
  - optimizer: Use OptimizerNames.ADAFACTOR and the args are:
49
  No additional optimizer arguments
50
  - lr_scheduler_type: linear
51
- - num_epochs: 2
52
  - mixed_precision_training: Native AMP
53
 
54
  ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
57
  |:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
58
- | 20.8291 | 1.0 | 76 | 3.8806 | 87.1845 | 35.44 |
59
- | 14.0408 | 1.9801 | 150 | 3.2806 | 89.9171 | 35.906 |
 
 
 
 
 
60
 
61
 
62
  ### Framework versions
 
1
  ---
2
  library_name: transformers
3
  license: mit
4
+ base_model: facebook/m2m100_1.2B
5
  tags:
6
  - generated_from_trainer
7
  metrics:
 
16
 
17
  # model_output
18
 
19
+ This model is a fine-tuned version of [facebook/m2m100_1.2B](https://huggingface.co/facebook/m2m100_1.2B) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 1.7263
22
+ - Bleu: 93.9441
23
+ - Gen Len: 36.358
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 2e-05
43
+ - train_batch_size: 32
44
+ - eval_batch_size: 32
45
  - seed: 42
46
  - gradient_accumulation_steps: 4
47
+ - total_train_batch_size: 128
48
  - optimizer: Use OptimizerNames.ADAFACTOR and the args are:
49
  No additional optimizer arguments
50
  - lr_scheduler_type: linear
51
+ - num_epochs: 7
52
  - mixed_precision_training: Native AMP
53
 
54
  ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
57
  |:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
58
+ | 23.051 | 1.0 | 38 | 4.3445 | 89.5045 | 35.746 |
59
+ | 15.9099 | 2.0 | 76 | 3.5044 | 91.9617 | 36.366 |
60
+ | 12.7846 | 3.0 | 114 | 2.8211 | 92.7676 | 36.22 |
61
+ | 10.3083 | 4.0 | 152 | 2.3006 | 93.675 | 36.284 |
62
+ | 8.4622 | 5.0 | 190 | 1.9316 | 93.6498 | 36.348 |
63
+ | 7.3015 | 6.0 | 228 | 1.7263 | 93.9441 | 36.358 |
64
+ | 6.8211 | 6.8212 | 259 | 1.6685 | 93.7274 | 36.306 |
65
 
66
 
67
  ### Framework versions