ardzdirect2

This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2790
Wer: 0.4103
Bleu: 0.3474
Rouge: {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Bleu	Rouge
3.1012	0.8316	100	0.4876	0.6806	0.0921	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.5491	1.6570	200	0.4093	0.6217	0.1411	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.5057	2.4823	300	0.3771	0.6112	0.1300	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4825	3.3077	400	0.3685	0.6012	0.1617	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4542	4.1331	500	0.3629	0.5924	0.1538	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4481	4.9647	600	0.3563	0.5766	0.1571	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4405	5.7900	700	0.3521	0.5841	0.1523	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.417	6.6154	800	0.3460	0.5775	0.1802	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4034	7.4407	900	0.3478	0.5748	0.1852	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.4108	8.2661	1000	0.3490	0.5529	0.1896	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3858	9.0915	1100	0.3277	0.5514	0.1920	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3831	9.9231	1200	0.3192	0.5474	0.2086	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3793	10.7484	1300	0.3265	0.5316	0.2156	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3691	11.5738	1400	0.3161	0.5341	0.2193	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3629	12.3992	1500	0.3108	0.5181	0.2280	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3619	13.2245	1600	0.3102	0.5214	0.2184	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.356	14.0499	1700	0.3249	0.5145	0.2345	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3392	14.8815	1800	0.3409	0.5234	0.2352	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3254	15.7069	1900	0.3034	0.5288	0.2279	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3446	16.5322	2000	0.3273	0.5074	0.2459	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3197	17.3576	2100	0.3097	0.5306	0.2287	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3167	18.1830	2200	0.3042	0.5164	0.2428	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3296	19.0083	2300	0.3053	0.5265	0.2271	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3158	19.8399	2400	0.3004	0.4763	0.2703	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3035	20.6653	2500	0.2917	0.4649	0.2836	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3026	21.4906	2600	0.2993	0.5098	0.2498	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.3023	22.3160	2700	0.3164	0.4760	0.2700	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2879	23.1414	2800	0.2825	0.4441	0.3079	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2834	23.9730	2900	0.2828	0.4685	0.2866	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2793	24.7983	3000	0.2938	0.4437	0.3082	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2706	25.6237	3100	0.2827	0.4508	0.3054	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2631	26.4491	3200	0.2871	0.4309	0.3264	{'rouge1': 0.0010395010395010396, 'rouge2': 0.0, 'rougeL': 0.0010395010395010396, 'rougeLsum': 0.0010395010395010396}
0.2742	27.2744	3300	0.2814	0.4360	0.3181	{'rouge1': 0.0010395010395010396, 'rouge2': 0.0, 'rougeL': 0.0010395010395010396, 'rougeLsum': 0.0010395010395010396}
0.2537	28.0998	3400	0.2923	0.4320	0.3197	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2576	28.9314	3500	0.2784	0.4296	0.3238	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2588	29.7568	3600	0.2830	0.4280	0.3304	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.243	30.5821	3700	0.2860	0.4254	0.3331	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2504	31.4075	3800	0.2829	0.4171	0.3403	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2491	32.2328	3900	0.2850	0.4194	0.3374	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2432	33.0582	4000	0.2901	0.4158	0.3359	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2383	33.8898	4100	0.2801	0.4171	0.3366	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2314	34.7152	4200	0.2818	0.4190	0.3404	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2339	35.5405	4300	0.2858	0.4146	0.3412	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2314	36.3659	4400	0.2954	0.4224	0.3315	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2324	37.1913	4500	0.2810	0.4119	0.3441	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.236	38.0166	4600	0.2791	0.4105	0.3462	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2277	38.8482	4700	0.2799	0.4110	0.3482	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}
0.2207	39.6736	4800	0.2790	0.4103	0.3474	{'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.0

ilyes25
/

ardzdirect2

ardzdirect2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ilyes25/ardzdirect2

Evaluation results