XLS-R-based CTC model with 5-gram language model from Common Voice
This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a small 5-gram language model is added based on the Common Voice training corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):
- Wer: 0.0669
- Cer: 0.0197
Model description
The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
To improve accuracy, a beam decoder is used; the beams are scored based on 5-gram language model trained on the Common Voice 8 corpus.
Intended uses & limitations
This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).
Training and evaluation data
- The model was initialized with the 2B parameter model from Facebook.
- The model was then trained
2000
iterations (batch size 32) on thedutch
configuration of themultilingual_librispeech
dataset. - The model was then trained
2000
iterations (batch size 32) on thenl
configuration of thecommon_voice_8_0
dataset. - The model was then trained
6000
iterations (batch size 32) on thecgn
dataset. - The model was then trained
6000
iterations (batch size 32) on thenl
configuation of thecommon_voice_8_0
dataset.
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Datasets used to train FremyCompany/xls-r-nl-v1-cv8-lm
Evaluation results
- Test WER on Common Voice 8self-reported6.690
- Test CER on Common Voice 8self-reported1.970
- Test WER on Robust Speech Event - Dev Dataself-reported20.790
- Test CER on Robust Speech Event - Dev Dataself-reported10.720
- Test WER on Robust Speech Event - Test Dataself-reported19.710