whisper-large-v3-Tarteel
Model Description
This model is a fine-tuned version of OpenAI’s Whisper Large V3 model, adapted specifically for Arabic Quranic speech recognition using the Tarteel AI Everyayah Dataset. It is optimized to transcribe Quranic recitations with improved accuracy on this specialized dataset.
Training Details
- Base model: openai/whisper-large-v3
- Dataset: Tarteel AI Everyayah Dataset (language: Arabic, splits: train + validation)
- Training steps: 5000
- Batch size: 16
- Learning rate: 1e-5
- Gradient checkpointing: enabled
- FP16 mixed precision: enabled
Loss and Metrics
- Training loss decreased to near zero
- Validation WER (Word Error Rate) improved steadily to ~48%
Known Issues / Notes
- The training process showed a warning regarding
use_cache=True
being incompatible with gradient checkpointing, which was automatically handled by disablinguse_cache
. - Attention mask warnings appear when the pad token is the same as the EOS token; providing explicit attention masks is recommended for reliable inference.
- This model is intended for Arabic Quranic speech only and may not perform well on other Arabic speech domains.
Intended Use
- Automatic speech recognition (ASR) of Quranic recitations in Arabic.
- Useful for Quranic audio transcription and research related to Islamic studies.
- Not intended for general Arabic speech recognition or other languages.
Usage Example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
model_name = "ijyad/whisper-large-v3-Tarteel"
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
# Load audio (replace with your audio file)
audio, rate = librosa.load("path_to_quran_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
- Downloads last month
- 123
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for IJyad/whisper-large-v3-Tarteel
Base model
openai/whisper-large-v3