Commit
·
b6b745d
1
Parent(s):
578377c
Zero-shot performance higlighted
Browse files- README.md +11 -9
- images/cer_comparison_zero-shot_roest.png +0 -0
README.md
CHANGED
@@ -362,15 +362,17 @@ Comparison of results on different Danish benchmarks:
|
|
362 |
|
363 |
The model was also tested against other datasets to evaluate generalizability:
|
364 |
|
365 |
-
|
366 |
-
|
|
367 |
-
|
|
368 |
-
|
|
369 |
-
| [
|
370 |
-
| [
|
371 |
-
| [
|
372 |
-
| [
|
373 |
-
| [
|
|
|
|
|
374 |
|
375 |
**OBS!** The vocab used for training incudes numerals (0,1,2,..,9), which are translated to text in a post-processing step. If the model misses spaces the numbers are interpreted as one, which especially affects the NST score as this dataset contains many numerals.
|
376 |
|
|
|
362 |
|
363 |
The model was also tested against other datasets to evaluate generalizability:
|
364 |
|
365 |
+
|
366 |
+
| | **Røst-wav2vec2-2B-v2** | | **Røst-wav2vec2-1B-v2** | | **Røst-wav2vec2-315M-v2** | | **Røst-wav2vec2-315M-v1** | | **Røst-whisper-large-v1** | |
|
367 |
+
| ------------------------------------------------------------------------------------- | ----------------------- | --------- | ----------------------- | --------- | ------------------------- | --------- | ------------------------- | --------- | ------------------------- | --------- |
|
368 |
+
| **Evaluation Dataset** | **WER %** | **CER %** | **WER %** | **CER %** | **WER %** | **CER %** | **WER %** | **CER %** | **WER %** | **CER %** |
|
369 |
+
| [CoRal](https://huggingface.co/datasets/CoRal-project/coral/viewer/read_aloud/test) | 16.0 | 6.2 | 16.4 | 6.5 | 16.3 | 6.5 | 17.0 | 6.6 | **10.4** | **4.3** |
|
370 |
+
| [NST-da](https://huggingface.co/datasets/alexandrainst/nst-da) | **27.0** | **11.7** | 27.7 | 11.9 | 28.4 | 12.4 | 29.7 | 13.9 | 29.8 | 14.5 |
|
371 |
+
| [CommonVoice17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | **12.0** | **4.5** | 26.3 | 10.9 | 14.4 | 5.4 | 16.7 | 6.6 | 15.6 | 8.2 |
|
372 |
+
| [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) | **12.5** | **5.1** | 13.7 | 5.5 | 15.6 | 6.1 | 16.6 | 6.3 | 12.6 | **5.1** |
|
373 |
+
| [AlvenirOss](https://huggingface.co/datasets/Alvenir/alvenir_asr_da_eval) | **8.1** | **3.1** | 9.1 | 3.6 | 11.3 | 4.4 | 14.8 | 6.0 | 9.2 | 3.9 |
|
374 |
+
| [AlvenirWiki](https://huggingface.co/datasets/Alvenir/alvenir_asr_da_eval) | **6.5** | **2.4** | 7.2 | 2.7 | 8.0 | 3.0 | 7.9 | 3.0 | 7.5 | 2.8 |
|
375 |
+
|
376 |
|
377 |
**OBS!** The vocab used for training incudes numerals (0,1,2,..,9), which are translated to text in a post-processing step. If the model misses spaces the numbers are interpreted as one, which especially affects the NST score as this dataset contains many numerals.
|
378 |
|
images/cer_comparison_zero-shot_roest.png
ADDED
![]() |