Whisper-Base Qurʾān LoRA 🕋📖

Low-rank‐adaptation (LoRA) fine-tune of tarteel-ai/whisper-base-ar-quran for Arabic Qurʾān recitation (tilâwah).
Provides diacritic-sensitive ASR with a test WER ≈ 5.98 %, beating:

model WER ↓ Δ vs ours
KheemP/whisper-base-quran-lora 0.0598
tarteel-ai/whisper-base-ar-quran 0.073 -1.3 ×
tarteel-ai/whisper-tiny-ar-quran 0.096 -1.6 ×
NVIDIA FastConformer large (NeMo) ≈ 0.069 -1.2 ×

(All scores measured on the same 610-ayah hold-out set, with no text normalisation – tashkīl included).


Quick start

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch, soundfile as sf

base_id  = "tarteel-ai/whisper-base-ar-quran"
lora_id  = "KheemP/whisper-base-quran-lora"

# load model+processor
model  = WhisperForConditionalGeneration.from_pretrained(base_id, torch_dtype=torch.float16)
model  = PeftModel.from_pretrained(model, lora_id)
proc   = WhisperProcessor.from_pretrained(base_id)

# transcribe an mp3 -> text
audio, _ = sf.read("my_recitation.mp3")
inputs   = proc(audio, sampling_rate=16_000, return_tensors="pt").to(model.device)
pred_ids = model.generate(**inputs)
print(proc.decode(pred_ids[0]))

⚠️ *This repo only stores the LoRA adapter (~2 MB). The code above automatically downloads the original Whisper base model and injects the adapter.*


Model details

Back-bone Whisper Base (77 M params)
LoRA rank / α / drop 8 / 16 / 0.05
Trainable params 0.59 M (0.8 %)
Epochs 5
Batch / grad-accum 2×4 (effective = 8)
LR / sched 5 · 10⁻⁴, constant
Mixed-precision fp16
Hardware single NVIDIA A100 40 GB

Target modules

q_proj, k_proj, v_proj, out_proj in both encoder & decoder self-attn and encoder-cross-attn blocks.


Training data

  • Dataset: 446 k MP3 ayāt scraped from https://quran.ksu.edu.sa, resampled to 16 kHz and paired with canonical text from all_ayat.json.

  • Filtering:

    • keep ≤ 30 s duration (→ 6091 ayāt)
    • pick shortest recording per ayah
    • 90 / 10 split ⇒ 5481 train / 610 test
  • Reciters: 37; round-robin sampling ensures balanced voices.


Evaluation

  • Metric: jiwer WER with no normalisation (diacritics matter).
  • Result: 0.0598 on the 610-ayah test split (95 % CI ± 0.003).

Intended use & limitations

Designed for speech-to-text of Qurʾān recitations in Modern Standard Arabic. Not expected to work for:

  • conversational Arabic, dialects or non-Qurʾānic liturgy
  • noisy, low-quality microphones
  • verses longer than 30 seconds

Citation

@software{quran_whisper_lora_2024,
  author       = {Kheem Dharmani},
  title        = {Whisper-Base Qurʾān LoRA Adapter},
  year         = 2024,
  url          = {https://huggingface.co/KheemP/whisper-base-quran-lora}
}

Licence

Back-bone weights under MIT (same as Whisper). Dataset sourced from the public domain. Adapter itself released under MIT.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KheemP/whisper-base-quran-lora

Adapter
(1)
this model