IJyad
/

whisper-large-v3-Tarteel

Automatic Speech Recognition

Model card Files Files and versions Metrics Training metrics Community

IJyad commited on 24 days ago

Commit

175e7c6

·

verified ·

1 Parent(s): ef5a151

Create README.md

Files changed (1) hide show

README.md +73 -0

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+---
+license: mit
+datasets:
+- tarteel-ai/everyayah
+language:
+- ar
+metrics:
+- wer
+base_model:
+- openai/whisper-large-v3
+pipeline_tag: automatic-speech-recognition
+tags:
+- speech-to-text
+- automatic-speech-recognition
+- arabic
+- quran
+- whisper
+- fine-tuned
+---
+# whisper-large-v3-Tarteel
+## Model Description
+This model is a fine-tuned version of OpenAI’s Whisper Large V3 model, adapted specifically for Arabic Quranic speech recognition using the Tarteel AI Everyayah Dataset. It is optimized to transcribe Quranic recitations with improved accuracy on this specialized dataset.
+## Training Details
+- **Base model:** openai/whisper-large-v3
+- **Dataset:** Tarteel AI Everyayah Dataset (language: Arabic, splits: train + validation)
+- **Training steps:** 5000
+- **Batch size:** 16
+- **Learning rate:** 1e-5
+- **Gradient checkpointing:** enabled
+- **FP16 mixed precision:** enabled
+### Loss and Metrics
+- Training loss decreased to near zero
+- Validation WER (Word Error Rate) improved steadily to ~48%
+## Known Issues / Notes
+- The training process showed a warning regarding `use_cache=True` being incompatible with gradient checkpointing, which was automatically handled by disabling `use_cache`.
+- Attention mask warnings appear when the pad token is the same as the EOS token; providing explicit attention masks is recommended for reliable inference.
+- This model is intended for Arabic Quranic speech only and may not perform well on other Arabic speech domains.
+## Intended Use
+- Automatic speech recognition (ASR) of Quranic recitations in Arabic.
+- Useful for Quranic audio transcription and research related to Islamic studies.
+- Not intended for general Arabic speech recognition or other languages.
+## Usage Example
+```python
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+import torch
+import librosa
+model_name = "ijyad/whisper-large-v3-Tarteel"
+processor = WhisperProcessor.from_pretrained(model_name)
+model = WhisperForConditionalGeneration.from_pretrained(model_name)
+# Load audio (replace with your audio file)
+audio, rate = librosa.load("path_to_quran_audio.wav", sr=16000)
+input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
+# Generate transcription
+predicted_ids = model.generate(input_features)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+print(transcription)