This model is a fine-tuned version of openai/whisper-small on the wTIMIT-US dataset using the F1-Mask augmentation method. F1-Mask applies frequency masking below 1.1 kHz, targeting energy regions below the maximum F1 frequency observed across whispered vowels, which are particularly susceptible to misclassification in whispered speech.

The masking is phoneme-aware and applied during training, helping the model focus on more informative frequency regions. For a detailed explanation and visualizations, see Section 3 of the thesis linked below.

It was evaluated on both normal and whispered speech subsets. Results show that F1-Mask improves WER on normal speech, outperforming SpecAugment, and performs competitively on whispered speech as well.

Evaluation Results on wTIMIT-US (Test Set)

Setup Training Data Augmentation WER (Normal) WER (Whispered)
No Fine-tuning Zero-shot None 5.0 13.7
Baseline Both modes None 5.8 11.7
SpecAugment Both modes SpecAugment (LD) 5.2 12.3
F1-Mask (Ours) Both modes F1-based Masking 4.8 (β˜…, p=0.038) 12.1 (ns, p=0.631)

β˜… = Statistically significant improvement over SpecAugment (paired MAPSSWE)
ns = No significant difference (not statistically significant)

Compared to SpecAugment, F1-Mask improved normal speech WER by 0.4% absolute (from 5.2% β†’ 4.8%, p=0.038), a statistically significant improvement. On whispered speech, it achieved a 0.2% reduction in WER (p=0.631), which is not statistically significant.

Cite as

Kokowski, J. (2025). F0-Based Masking Policies for Self-Supervised Whispered Speech Recognition. Master’s Thesis, University of Groningen, Campus FryslΓ’n.
Available at: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/674

If you use this model or build upon this work, please cite the thesis above.

Model: Whisper-small
Augmentation: F1-Mask
Evaluation toolkit: SCTK (sclite)
Notes: For complete results, including MAPSSWE and CER scores, refer to Section 5 of the thesis.

πŸ”— Related Models

Downloads last month
25
Safetensors
Model size
242M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jankoko/PALF-F1-Whisper-small

Finetuned
(2758)
this model