alexue4's picture
End of training
1ce4365
|
raw
history blame
3.89 kB
metadata
license: mit
base_model: cointegrated/rut5-small
tags:
  - generated_from_trainer
model-index:
  - name: text-normalization-ru-new
    results: []

text-normalization-ru-new

This model is a fine-tuned version of cointegrated/rut5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0985
  • Mean Distance: 0
  • Max Distance: 9

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 30
  • eval_batch_size: 30
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Mean Distance Max Distance
0.181 1.0 2598 0.2140 4 36
0.1067 2.0 5196 0.1416 2 29
0.0801 3.0 7794 0.1098 2 22
0.0575 4.0 10392 0.1081 2 18
0.0452 5.0 12990 0.0897 1 14
0.0372 6.0 15588 0.0720 1 15
0.0323 7.0 18186 0.0840 1 12
0.0267 8.0 20784 0.0768 1 16
0.0231 9.0 23382 0.0697 1 10
0.0199 10.0 25980 0.0717 1 9
0.0168 11.0 28578 0.0812 1 16
0.0148 12.0 31176 0.0961 1 12
0.0128 13.0 33774 0.0823 1 9
0.0112 14.0 36372 0.0766 1 12
0.0093 15.0 38970 0.0713 1 9
0.0083 16.0 41568 0.0847 1 14
0.0076 17.0 44166 0.0863 1 11
0.0064 18.0 46764 0.0830 1 14
0.0054 19.0 49362 0.0884 1 11
0.0052 20.0 51960 0.0821 1 10
0.0045 21.0 54558 0.0915 1 14
0.0037 22.0 57156 0.0931 1 14
0.0036 23.0 59754 0.0941 1 9
0.0028 24.0 62352 0.0861 1 13
0.0026 25.0 64950 0.0912 1 12
0.0024 26.0 67548 0.0916 0 9
0.002 27.0 70146 0.0888 0 9
0.0017 28.0 72744 0.0888 0 9
0.0017 29.0 75342 0.0952 0 9
0.0014 30.0 77940 0.0985 0 9

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.4
  • Tokenizers 0.13.3