Whisper Large V2 zh-HK - Alvin

This model is a fine-tuned version of openai/whisper-large-v2 on the Common Voice 11.0 dataset. This is trained with PEFT LoRA+BNB INT8 with a Normalized CER of 7.77%

To use the model, use the following code. It should be able to inference with less than 4GB VRAM (batch size of 1).

from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer, WhisperTokenizer, WhisperProcessor

peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
    peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)

task = "transcribe"
tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, task=task)
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, task=task)
feature_extractor = processor.feature_extractor
forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)

audio = # load audio here
text = pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]

Training and evaluation data

For training, three datasets were used:

  • Common Voice 11 Canto Train Set
  • CantoMap: Winterstein, Grégoire, Tang, Carmen and Lai, Regine (2020) "CantoMap: a Hong Kong Cantonese MapTask Corpus", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 2899-2906.
  • Cantonse-ASR: Yu, Tiezheng, Frieske, Rita, Xu, Peng, Cahyawijaya, Samuel, Yiu, Cheuk Tung, Lovenia, Holy, Dai, Wenliang, Barezi, Elham, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram, Fung, Pascale (2022) "Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset", 2022. Link: https://arxiv.org/pdf/2201.02419.pdf

Training Hyperparameters

  • learning_rate: 1e-3
  • train_batch_size: 60 (on 1 3090 GPU)
  • eval_batch_size: 10
  • gradient_accumulation_steps: 1
  • total_train_batch_size: 60x1x1=60
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 12000
  • augmentation: SpecAugment

Training Results

Training Loss Epoch Step Validation Loss Normalized CER
0.8604 1.99 12000 0.2129 0.07766
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train alvanlii/whisper-largev2-cantonese-peft-lora

Evaluation results