Whisper Medium Egyptian Arabic (whisper-medium-egy)

This model is a fine-tuned version of openai/whisper-medium on a custom dataset of 72 hours of Egyptian Arabic speech. It's designed for Automatic Speech Recognition (ASR) for the Egyptian Arabic dialect.

Model Description

  • Base Model: openai/whisper-medium
  • Language: Arabic (ar), specifically focused on Egyptian dialect (arz)
  • Fine-tuning Dataset: MAdel121/arabic-egy-cleaned (approx. 72 hours)
  • Total Training Steps: 7299
  • Epochs: 10

Intended Uses & Limitations

This model is intended for transcribing speech in Egyptian Arabic.

Intended Use:

  • Automatic transcription of audio recordings and live speech in Egyptian Arabic.
  • Assisting with content creation, subtitling, and voice-controlled applications for Egyptian Arabic speakers.

Limitations:

  • Performance may degrade in highly noisy environments or with very strong, non-Egyptian accents.
  • The model was fine-tuned on a specific dataset; its performance on significantly different domains or audio characteristics might vary.
  • The training data primarily consists of [describe your dataset sources/domains if possible, e.g., "YouTube videos", "audiobooks", "scripted conversations"]. Performance might be better on similar types of audio.

How to Use

You can use this model with the transformers library and the pipeline interface for ease of use.

from transformers import pipeline
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
  "automatic-speech-recognition",
  model="YOUR_HF_USERNAME/whisper-medium-egy", # Replace YOUR_HF_USERNAME with your Hugging Face username
  device=device
)

# Example with a local audio file
# audio_file = "path/to/your/egyptian_arabic_audio.wav"
# transcription = pipe(audio_file, generate_kwargs={"language": "arabic"})["text"]
# print(transcription)

# Example with a Hugging Face dataset audio sample
# from datasets import load_dataset
# ds = load_dataset("MAdel121/arabic-egy-cleaned", "ar", split="validation") # Or your test split
# sample = ds[0]["audio"] # Make sure your dataset has an "audio" column
# result = pipe(sample.copy(), generate_kwargs={"language": "arabic"})
# print(result["text"])

Make sure to replace "YOUR_HF_USERNAME/whisper-medium-egy" with the actual model ID after uploading. The generate_kwargs={"language": "arabic"} is important for Whisper models to ensure correct tokenization and transcription for the target language.

Training Data

The model was fine-tuned on the MAdel121/arabic-egy-cleaned dataset available on the Hugging Face Hub. This dataset contains approximately 72 hours of Egyptian Arabic audio paired with transcripts.

Training Procedure

The model was trained using the transformers library. The fine-tuning process involved the following key hyperparameters:

  • Base Model: openai/whisper-medium
  • Optimizer: AdamW
  • Learning Rate: 1e-5 (0.00001)
  • Warmup Steps: 1000
  • Weight Decay: 0.05
  • Gradient Accumulation Factor: 2
  • Batch Size (loader_batch_size): 8 (effective batch size would be 8 * 2 = 16)
  • Number of Epochs: 10
  • Max Grad Norm: 5
  • Augmentations Used:
    • use_drop_freq: true
    • use_drop_chunk: true
    • use_drop_bit_resolution: true
    • Other augmentations like use_add_noise, use_speed_perturb, use_pitch_shift, use_add_reverb, use_codec_augment, use_gain were set to false
  • Task: transcribe
  • Language: ar
  • Seed: 1986

Training was done on 1x A100 (80GB) on Modal Labs

The training was managed and tracked using Weights & Biases under the project whisper-medium-egyptian-arabic with resume ID r3sz4v27.

Training Code

Can be found on Github here

Weights & Biases

Run can be found here : https://wandb.ai/m-adelomar1/whisper-medium-egyptian-arabic/

Evaluation Results

The model was evaluated on the validation split of the MAdel121/arabic-egy-cleaned dataset.

  • Word Error Rate (WER): 18.03%
  • Character Error Rate (CER): 13.38%

These metrics indicate the performance of the model on the validation set. Lower values are better.

BibTeX Citation

@misc{madel_2025_whisper_medium_egy,
  author    = Madel
  title     = {Whisper Medium Fine-tuned for Egyptian Arabic},
  year      = {2025},
  publisher = {Hugging Face},
  journal   = {Hugging Face Hub},
  howpublished = {\\url{https://huggingface.co/MAdel121/whisper-medium-egy}} // Replace with actual URL
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MAdel121/whisper-medium-egy

Finetuned
(642)
this model

Dataset used to train MAdel121/whisper-medium-egy

Collection including MAdel121/whisper-medium-egy

Evaluation results