Whisper Medium Optimized for Stuttered Speech

This model is a fine-tuned version of openai/whisper-medium on the TimeStamped dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 8000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss	Wer	Wer Ortho	Cer
1.4886	5.8187	500	1.6803	13.3866	6.8710	6.8779
1.4379	11.6316	1000	1.7141	11.9117	7.1422	7.1376
1.4115	17.4444	1500	1.7555	11.5740	6.3332	6.3308
1.4093	23.2573	2000	1.7849	13.2177	7.8678	7.8655
1.4068	29.0702	2500	1.8295	12.0243	6.7643	6.7527
1.4178	34.8889	3000	1.8251	16.7530	10.8906	10.8790
1.4169	40.7018	3500	1.8499	12.3959	6.8872	6.8802
1.4005	46.5146	4000	1.8905	13.2740	7.4482	7.4366
1.3999	52.3275	4500	1.9124	13.3191	7.4946	7.4830
1.3999	58.1404	5000	1.9269	13.9045	7.8376	7.8237
1.4135	63.9591	5500	1.9485	13.8257	7.8585	7.8446
1.4133	69.7719	6000	1.9668	13.8820	7.8446	7.8284
1.4132	75.5848	6500	1.9779	13.9833	7.8863	7.8724
1.3989	81.3977	7000	1.9854	13.9946	7.8770	7.8608
1.3989	87.2105	7500	1.9884	14.0396	7.9280	7.9118
1.3989	93.0234	8000	1.9889	14.0509	7.9350	7.9188