Fine-tuning Whisper Turbo for Arabic: Expected Accuracy Improvements?

#12

by neethuvm - opened Oct 3, 2024

Oct 3, 2024

HI, I am considering fine-tuning the Whisper Turbo model for Arabic. Given that Arabic has its own set of phonetic and linguistic challenges, I believe that fine-tuning this model using an Arabic dataset, such as Common Voice or custom transcriptions, could significantly improve its accuracy for Arabic speech.

DhruvWappnet

Oct 21, 2024

Hey, Have you got any success for this task ? I was looking for a guidance or insights for the same. Also, could you suggest any open sources available data for arabic audio ?

neethuvm

Oct 24, 2024

https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ar You can use this for basic arabic data finetune ,

deepdml

Nov 13, 2024

I'm doing some fine-tuning for Arabic:

razaullah314

10 days ago

•

edited 10 days ago

I am using openai/whisper-large-v3-turbo for transcribing voice in English or Arabic. In Arabic when the user says نعم, it is been transcribed wrongly i.e. Naah or Naahe. Any one have faced the same issue and what is the solution.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment