MMS Fine-tuned model for urd-script_arabic
This model is a fine-tuned version of facebook/mms-1b-all on the urdu-asr/csalt-voice dataset.
Model description
This is an adapter-based fine-tuned model that uses the MMS-1B model as the base and only trains language-specific adapters for Urdu speech recognition. The model uses Urdu written in Arabic script.
Training and evaluation
- Training Data: urdu-asr/csalt-voice dataset
- Training Time: Approximately 3.7 hours
- Final Word Error Rate (WER): 41.92%
- Final Loss: 0.9216
- Training Method: Adapter-based fine-tuning with frozen base model
The model was trained for 98.11 epochs, with the best model saved at the end.
Usage
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa
# Load model & processor
processor = Wav2Vec2Processor.from_pretrained("atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic")
model = Wav2Vec2ForCTC.from_pretrained("atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic")
# Load audio (ensure 16kHz sampling rate)
audio_input, sample_rate = librosa.load("path/to/audio.wav", sr=16000)
# Process audio
inputs = processor(audio_input, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(inputs.input_values).logits
# Decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for atariq701/mms-1b-allFT-urdu-asr-csalt-voice-urd-script_arabic
Base model
facebook/mms-1b-all