SpeechT5_finetuned_kha

This model is a fine-tuned version of microsoft/speecht5_vc on the audiofolder dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4733

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 512
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 300
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.544 36.8664 1000 0.5145
0.5013 73.7327 2000 0.4800
0.4754 110.5991 3000 0.4705
0.4651 147.4654 4000 0.4710
0.456 184.3318 5000 0.4699
0.446 221.1982 6000 0.4702
0.443 258.0645 7000 0.4714
0.4437 294.9309 8000 0.4733

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.4.0
  • Datasets 3.0.1
  • Tokenizers 0.19.1
Downloads last month
17
Safetensors
Model size
144M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jefson08/speecht5_finetuned_kha

Finetuned
(1)
this model