alexue4's picture
End of training
407d7de
|
raw
history blame
6.44 kB
metadata
license: mit
base_model: cointegrated/rut5-small
tags:
  - generated_from_trainer
model-index:
  - name: text-normalization-ru-new
    results: []

text-normalization-ru-new

This model is a fine-tuned version of cointegrated/rut5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0177
  • Mean Distance: 0
  • Max Distance: 15

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 30
  • eval_batch_size: 30
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 60

Training results

Training Loss Epoch Step Validation Loss Mean Distance Max Distance
0.2236 1.0 3298 0.1120 5 133
0.1179 2.0 6596 0.0548 3 82
0.0829 3.0 9894 0.0425 1 46
0.0643 4.0 13192 0.0311 1 64
0.0538 5.0 16490 0.0267 1 48
0.0469 6.0 19788 0.0396 2 80
0.0385 7.0 23086 0.0262 2 73
0.0316 8.0 26384 0.0223 1 40
0.0263 9.0 29682 0.0240 1 69
0.0226 10.0 32980 0.0203 1 60
0.0203 11.0 36278 0.0177 1 54
0.0178 12.0 39576 0.0188 1 61
0.0154 13.0 42874 0.0296 1 65
0.0138 14.0 46172 0.0201 1 55
0.012 15.0 49470 0.0268 1 67
0.0109 16.0 52768 0.0163 1 35
0.0105 17.0 56066 0.0136 1 26
0.0092 18.0 59364 0.0202 1 65
0.0087 19.0 62662 0.0221 1 65
0.0075 20.0 65960 0.0203 1 33
0.0067 21.0 69258 0.0226 1 26
0.0062 22.0 72556 0.0184 1 24
0.0059 23.0 75854 0.0131 0 18
0.0054 24.0 79152 0.0270 1 58
0.0052 25.0 82450 0.0244 1 45
0.0044 26.0 85748 0.0149 1 23
0.0043 27.0 89046 0.0256 1 63
0.0038 28.0 92344 0.0172 1 30
0.0036 29.0 95642 0.0224 1 37
0.0033 30.0 98940 0.0194 1 30
0.0031 31.0 102238 0.0238 1 59
0.003 32.0 105536 0.0200 1 28
0.0028 33.0 108834 0.0161 0 18
0.0027 34.0 112132 0.0215 1 26
0.0025 35.0 115430 0.0198 0 19
0.0023 36.0 118728 0.0168 0 24
0.002 37.0 122026 0.0221 1 32
0.0019 38.0 125324 0.0214 1 32
0.0017 39.0 128622 0.0186 0 19
0.0017 40.0 131920 0.0171 0 23
0.0016 41.0 135218 0.0164 0 17
0.0015 42.0 138516 0.0166 1 21
0.0014 43.0 141814 0.0167 0 21
0.0019 44.0 145112 0.0192 1 32
0.0011 45.0 148410 0.0209 1 27
0.0011 46.0 151708 0.0218 0 23
0.001 47.0 155006 0.0195 0 25
0.0009 48.0 158304 0.0166 0 15
0.0008 49.0 161602 0.0210 1 31
0.0008 50.0 164900 0.0230 0 22
0.0008 51.0 168198 0.0184 0 15
0.0007 52.0 171496 0.0183 0 15
0.0006 53.0 174794 0.0234 1 32
0.0005 54.0 178092 0.0227 0 24
0.0004 55.0 181390 0.0188 0 15
0.0005 56.0 184688 0.0191 0 15
0.0004 57.0 187986 0.0183 0 15
0.0003 58.0 191284 0.0180 0 15
0.0003 59.0 194582 0.0180 0 15
0.0004 60.0 197880 0.0177 0 15

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.4
  • Tokenizers 0.13.3