Model Card for Model ID

This model card describes a fine-tuned version of the Whisper-large-v3-turbo model, optimized for Mandarin automatic speech recognition (ASR). The model was fine-tuned on the Common Voice 13.0 dataset using PEFT with LoRA to ensure efficient training while maintaining the performance of the original model. It achieves the following results on the evaluation set:

  • Common Voice 13.0 dataset(test):
    Wer before fine-tune: 77.08
    Wer after fine-tune: 40.29
  • Common Voice 16.1 dataset(test):
    Wer before fine-tune: 77.57
    Wer after fine-tune: 40.39

Uses

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "sandy1990418/whisper-large-v3-turbo-chinese"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

result = pipe(sample)
print(result["text"])
Downloads last month
63
Safetensors
Model size
809M params
Tensor type
FP16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for sandy1990418/whisper-large-v3-turbo-chinese

Finetuned
(127)
this model

Dataset used to train sandy1990418/whisper-large-v3-turbo-chinese