metadata

license: mit
base_model: cointegrated/rut5-small
tags:
  - generated_from_trainer
model-index:
  - name: text-normalization-ru-new
    results: []

text-normalization-ru-new

This model is a fine-tuned version of cointegrated/rut5-small on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0177
Mean Distance: 0
Max Distance: 15

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 30
eval_batch_size: 30
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 60

Training results

Training Loss	Epoch	Step	Validation Loss	Mean Distance	Max Distance
0.2236	1.0	3298	0.1120	5	133
0.1179	2.0	6596	0.0548	3	82
0.0829	3.0	9894	0.0425	1	46
0.0643	4.0	13192	0.0311	1	64
0.0538	5.0	16490	0.0267	1	48
0.0469	6.0	19788	0.0396	2	80
0.0385	7.0	23086	0.0262	2	73
0.0316	8.0	26384	0.0223	1	40
0.0263	9.0	29682	0.0240	1	69
0.0226	10.0	32980	0.0203	1	60
0.0203	11.0	36278	0.0177	1	54
0.0178	12.0	39576	0.0188	1	61
0.0154	13.0	42874	0.0296	1	65
0.0138	14.0	46172	0.0201	1	55
0.012	15.0	49470	0.0268	1	67
0.0109	16.0	52768	0.0163	1	35
0.0105	17.0	56066	0.0136	1	26
0.0092	18.0	59364	0.0202	1	65
0.0087	19.0	62662	0.0221	1	65
0.0075	20.0	65960	0.0203	1	33
0.0067	21.0	69258	0.0226	1	26
0.0062	22.0	72556	0.0184	1	24
0.0059	23.0	75854	0.0131	0	18
0.0054	24.0	79152	0.0270	1	58
0.0052	25.0	82450	0.0244	1	45
0.0044	26.0	85748	0.0149	1	23
0.0043	27.0	89046	0.0256	1	63
0.0038	28.0	92344	0.0172	1	30
0.0036	29.0	95642	0.0224	1	37
0.0033	30.0	98940	0.0194	1	30
0.0031	31.0	102238	0.0238	1	59
0.003	32.0	105536	0.0200	1	28
0.0028	33.0	108834	0.0161	0	18
0.0027	34.0	112132	0.0215	1	26
0.0025	35.0	115430	0.0198	0	19
0.0023	36.0	118728	0.0168	0	24
0.002	37.0	122026	0.0221	1	32
0.0019	38.0	125324	0.0214	1	32
0.0017	39.0	128622	0.0186	0	19
0.0017	40.0	131920	0.0171	0	23
0.0016	41.0	135218	0.0164	0	17
0.0015	42.0	138516	0.0166	1	21
0.0014	43.0	141814	0.0167	0	21
0.0019	44.0	145112	0.0192	1	32
0.0011	45.0	148410	0.0209	1	27
0.0011	46.0	151708	0.0218	0	23
0.001	47.0	155006	0.0195	0	25
0.0009	48.0	158304	0.0166	0	15
0.0008	49.0	161602	0.0210	1	31
0.0008	50.0	164900	0.0230	0	22
0.0008	51.0	168198	0.0184	0	15
0.0007	52.0	171496	0.0183	0	15
0.0006	53.0	174794	0.0234	1	32
0.0005	54.0	178092	0.0227	0	24
0.0004	55.0	181390	0.0188	0	15
0.0005	56.0	184688	0.0191	0	15
0.0004	57.0	187986	0.0183	0	15
0.0003	58.0	191284	0.0180	0	15
0.0003	59.0	194582	0.0180	0	15
0.0004	60.0	197880	0.0177	0	15

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu117
Datasets 2.14.4
Tokenizers 0.13.3