JacobLinCool's picture
Update README.md
77dca7e verified
metadata
library_name: transformers
language:
  - zh
license: mit
base_model: openai/whisper-large-v3-turbo
tags:
  - wft
  - whisper
  - automatic-speech-recognition
  - audio
  - speech
  - generated_from_trainer
datasets:
  - JacobLinCool/common_voice_19_0_zh-TW
metrics:
  - wer
model-index:
  - name: whisper-large-v3-turbo-common_voice_19_0-zh-TW
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: JacobLinCool/common_voice_19_0_zh-TW
          type: JacobLinCool/common_voice_19_0_zh-TW
        metrics:
          - type: wer
            value: 32.55535607420706
            name: Wer

whisper-large-v3-turbo-common_voice_19_0-zh-TW

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the JacobLinCool/common_voice_19_0_zh-TW dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1786
  • Wer: 32.5554
  • Cer: 8.6009
  • Decode Runtime: 90.9833
  • Wer Runtime: 0.1257
  • Cer Runtime: 0.1534

Model description

This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.

Intended uses & limitations

This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (zh), we expect that performance may degrade when transcribing Simplified Chinese.

The model is free to use under the MIT license.

Training and evaluation data

This model was trained on the Common Voice Corpus 19.0 Chinese (Taiwan) Subset, containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (train+validation) of mozilla-foundation/common_voice_16_1, which includes about 12k examples.

Training procedure

Tensorboard

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • training_steps: 5000

Training results

Training Loss Epoch Step Validation Loss Wer Cer Decode Runtime Wer Runtime Cer Runtime
No log 0 0 2.7208 76.5011 20.4851 89.4916 0.1213 0.1639
1.1832 0.1 500 0.1939 39.9561 10.8721 90.0926 0.1222 0.1555
1.5179 0.2 1000 0.1774 37.6621 9.9322 89.8657 0.1225 0.1545
0.6179 0.3 1500 0.1796 36.2657 9.8325 90.2480 0.1198 0.1573
0.3626 1.0912 2000 0.1846 36.2258 9.7801 90.3306 0.1196 0.1539
0.1311 1.1912 2500 0.1776 34.8095 9.3214 90.3124 0.1286 0.1610
0.1263 1.2912 3000 0.1763 36.1261 9.3563 90.4271 0.1330 0.1650
0.2194 2.0825 3500 0.1891 34.6898 9.3114 91.1932 0.1320 0.1643
0.1127 2.1825 4000 0.1838 34.0714 9.1095 90.2416 0.1196 0.1529
0.3792 2.2824 4500 0.1786 33.1339 8.7679 90.9144 0.1310 0.1550
0.0606 3.0737 5000 0.1786 32.5554 8.6009 90.9833 0.1257 0.1534

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.1
  • Pytorch 2.4.0
  • Datasets 3.0.2
  • Tokenizers 0.20.1