A version with noise detection is trained base on this model, to reduce hallucination during streaming:
Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
transformers-4.49.0
TODO: Improve zh-CN performance
2025-07-06: CER:
Dataset | Lang | Split | CER(in %) |
---|---|---|---|
Training | yue | validation | 8.92 |
mozilla-foundation/common_voice_17_0 | yue | test | 8.86 |
JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 7.96 |
mozilla-foundation/common_voice_17_0 | en | test | 6.84 |
mozilla-foundation/common_voice_16_1 | zh-CN | test | 43.0 |
per_device_train_batch_size=32,
learning_rate=1e-7,
2025-07-03: CER:
Dataset | Lang | Split | CER(in %) |
---|---|---|---|
Training | yue | validation | 9.705 |
mozilla-foundation/common_voice_17_0 | yue | test | 9.31 |
JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 8.37 |
per_device_train_batch_size=32,
learning_rate=1e-5,
CER: 13.7%
Train Args:
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=16,
generation_max_length=225,
Hardware:
NVIDIA Tesla V100 16GB * 4
A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git
FAQ:
- If having tokenizer issue during inference, please update your transformers version to >= 4.49.0
pip install --upgrade transformers
- Downloads last month
- 1,343
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support