fsicoli/whisper-large-v3-pt-3000h-4 · Comparison between models

Apr 24

I saw that you uploaded many models online (thanks for this!) with different ways to assess them, so I do not know which one to pick up. For example:

whisper-large-v3-pt-cv16-cuda is evaluated on mozilla-foundation/common_voice_16_0 pt dataset
whisper-large-v3-pt-1000h is evaluated on fsicoli/cv17-fleurs-coraa-mls-ted-alcaim-cf-cdc-lapsbm-lapsmail-sydney-lingualibre-voxforge-tatoeba default dataset.
whisper-large-v3-pt-3000h-4 is evaluated on fsicoli/common_voice_18_0 pt dataset

I want to get the transcripts of standup shows in brasilian portuguese, should I take whisper-large-v3-pt-3000h-4 or whisper-large-v3-pt-3000h or any other one? What would you recommend?

Thanks in advance for the help!

fsicoli

Owner Apr 25

Hello @valbarriere

I'm glad that you're interested in Brazilian Portuguese stand up shows. There are many great comedians out there.

I would recommend you to use the best model I've created so far, which is fsicoli/whisper-large-v3-pt-cv19-fleurs.

Let me know how it worked out for you.

Take care.

valbarriere

Apr 25

Hi @fsicoli thanks for the prompt answer!
I bet they are good, even though I do not understand them so far ahah
Ok great, I'll use the one you recommend. Do you have official results of this model and comparison with others? At least with whisper-large-v3... Something I could use on a publication to back up the claim that this ASR works better on my data.
Thanks again for the help!