--- language: mar license: apache-2.0 tags: - audio - automatic-speech-recognition - speech - marathi datasets: - openslr model-index: - name: Marathi ASR results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Marathi OpenSLR Dataset type: openslr metrics: - name: Word Error Rate type: wer value: "Your WER here" # Replace with your model's WER --- # Marathi ASR Model This is a fine-tuned Wav2Vec2-BERT model for Automatic Speech Recognition (ASR) in Marathi language. ## Model Details - **Model Type:** Wav2Vec2-BERT for CTC - **Language:** Marathi - **Training Dataset:** OpenSLR Marathi Dataset - **Last Updated:** April 16, 2025 ## Usage ```python from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC import torchaudio import torch # Load model and processor processor = Wav2Vec2BertProcessor.from_pretrained("hriteshMaikap/marathi-asr-model") model = Wav2Vec2BertForCTC.from_pretrained("hriteshMaikap/marathi-asr-model") # Load audio waveform, sample_rate = torchaudio.load("audio.wav") # Resample if needed if sample_rate != 16000: resampler = torchaudio.transforms.Resample(sample_rate, 16000) waveform = resampler(waveform) sample_rate = 16000 # Convert to mono if needed if waveform.shape[0] > 1: waveform = torch.mean(waveform, dim=0, keepdim=True) # Convert to numpy speech_array = waveform.squeeze().numpy() # Transcribe inputs = processor(speech_array, sampling_rate=16000, return_tensors="pt") with torch.no_grad(): logits = model(inputs.input_features).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.decode(predicted_ids[0]) print(transcription) Step Training Loss Validation Loss Wer 300 0.211100 0.220232 0.183333 600 0.086900 0.172057 0.113889