MMS Fine-tuned model for urd-script_arabic

This model is a fine-tuned version of facebook/mms-1b-all on the urdu-asr/csalt-voice dataset.

Model description

This is an adapter-based fine-tuned model that uses the MMS-1B model as the base and only trains language-specific adapters for Urdu speech recognition. The model uses Urdu written in Arabic script.

Training and evaluation

  • Training Data: urdu-asr/csalt-voice dataset
  • Training Time: Approximately 3.7 hours
  • Final Word Error Rate (WER): 41.92%
  • Final Loss: 0.9216
  • Training Method: Adapter-based fine-tuning with frozen base model

The model was trained for 98.11 epochs, with the best model saved at the end.

Usage

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa

# Load model & processor
processor = Wav2Vec2Processor.from_pretrained("atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic")
model = Wav2Vec2ForCTC.from_pretrained("atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic")

# Load audio (ensure 16kHz sampling rate)
audio_input, sample_rate = librosa.load("path/to/audio.wav", sr=16000)

# Process audio
inputs = processor(audio_input, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(inputs.input_values).logits

# Decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
Downloads last month
10
Safetensors
Model size
965M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic

Finetuned
(292)
this model

Dataset used to train atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic