Fine-tuning Whisper Turbo for Arabic: Expected Accuracy Improvements?

#12
by neethuvm - opened

HI, I am considering fine-tuning the Whisper Turbo model for Arabic. Given that Arabic has its own set of phonetic and linguistic challenges, I believe that fine-tuning this model using an Arabic dataset, such as Common Voice or custom transcriptions, could significantly improve its accuracy for Arabic speech.

Hey, Have you got any success for this task ? I was looking for a guidance or insights for the same. Also, could you suggest any open sources available data for arabic audio ?

https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ar You can use this for basic arabic data finetune ,

I am using openai/whisper-large-v3-turbo for transcribing voice in English or Arabic. In Arabic when the user says نعم, it is been transcribed wrongly i.e. Naah or Naahe. Any one have faced the same issue and what is the solution.

Sign up or log in to comment