Whisper Medium Egyptian Arabic (whisper-medium-egy)
This model is a fine-tuned version of openai/whisper-medium on a custom dataset of 72 hours of Egyptian Arabic speech. It's designed for Automatic Speech Recognition (ASR) for the Egyptian Arabic dialect.
Model Description
- Base Model:
openai/whisper-medium
- Language: Arabic (ar), specifically focused on Egyptian dialect (arz)
- Fine-tuning Dataset:
MAdel121/arabic-egy-cleaned
(approx. 72 hours) - Total Training Steps: 7299
- Epochs: 10
Intended Uses & Limitations
This model is intended for transcribing speech in Egyptian Arabic.
Intended Use:
- Automatic transcription of audio recordings and live speech in Egyptian Arabic.
- Assisting with content creation, subtitling, and voice-controlled applications for Egyptian Arabic speakers.
Limitations:
- Performance may degrade in highly noisy environments or with very strong, non-Egyptian accents.
- The model was fine-tuned on a specific dataset; its performance on significantly different domains or audio characteristics might vary.
- The training data primarily consists of [describe your dataset sources/domains if possible, e.g., "YouTube videos", "audiobooks", "scripted conversations"]. Performance might be better on similar types of audio.
How to Use
You can use this model with the transformers
library and the pipeline
interface for ease of use.
from transformers import pipeline
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="YOUR_HF_USERNAME/whisper-medium-egy", # Replace YOUR_HF_USERNAME with your Hugging Face username
device=device
)
# Example with a local audio file
# audio_file = "path/to/your/egyptian_arabic_audio.wav"
# transcription = pipe(audio_file, generate_kwargs={"language": "arabic"})["text"]
# print(transcription)
# Example with a Hugging Face dataset audio sample
# from datasets import load_dataset
# ds = load_dataset("MAdel121/arabic-egy-cleaned", "ar", split="validation") # Or your test split
# sample = ds[0]["audio"] # Make sure your dataset has an "audio" column
# result = pipe(sample.copy(), generate_kwargs={"language": "arabic"})
# print(result["text"])
Make sure to replace "YOUR_HF_USERNAME/whisper-medium-egy"
with the actual model ID after uploading. The generate_kwargs={"language": "arabic"}
is important for Whisper models to ensure correct tokenization and transcription for the target language.
Training Data
The model was fine-tuned on the MAdel121/arabic-egy-cleaned
dataset available on the Hugging Face Hub. This dataset contains approximately 72 hours of Egyptian Arabic audio paired with transcripts.
Training Procedure
The model was trained using the transformers
library. The fine-tuning process involved the following key hyperparameters:
- Base Model:
openai/whisper-medium
- Optimizer: AdamW
- Learning Rate: 1e-5 (0.00001)
- Warmup Steps: 1000
- Weight Decay: 0.05
- Gradient Accumulation Factor: 2
- Batch Size (loader_batch_size): 8 (effective batch size would be 8 * 2 = 16)
- Number of Epochs: 10
- Max Grad Norm: 5
- Augmentations Used:
use_drop_freq
: trueuse_drop_chunk
: trueuse_drop_bit_resolution
: true- Other augmentations like
use_add_noise
,use_speed_perturb
,use_pitch_shift
,use_add_reverb
,use_codec_augment
,use_gain
were set tofalse
- Task: transcribe
- Language: ar
- Seed: 1986
Training was done on 1x A100 (80GB) on Modal Labs
The training was managed and tracked using Weights & Biases under the project whisper-medium-egyptian-arabic
with resume ID r3sz4v27
.
Training Code
Can be found on Github here
Weights & Biases
Run can be found here : https://wandb.ai/m-adelomar1/whisper-medium-egyptian-arabic/
Evaluation Results
The model was evaluated on the validation
split of the MAdel121/arabic-egy-cleaned
dataset.
- Word Error Rate (WER): 18.03%
- Character Error Rate (CER): 13.38%
These metrics indicate the performance of the model on the validation set. Lower values are better.
BibTeX Citation
@misc{madel_2025_whisper_medium_egy,
author = Madel
title = {Whisper Medium Fine-tuned for Egyptian Arabic},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\\url{https://huggingface.co/MAdel121/whisper-medium-egy}} // Replace with actual URL
}
Model tree for MAdel121/whisper-medium-egy
Base model
openai/whisper-mediumDataset used to train MAdel121/whisper-medium-egy
Collection including MAdel121/whisper-medium-egy
Evaluation results
- WER on MAdel121/arabic-egy-cleaned (validation split)validation set self-reported18.030
- CER on MAdel121/arabic-egy-cleaned (validation split)validation set self-reported13.375