🎧 Khmer Whisper ASR β€” songhieng/whisper-khmer-v1

This is a fine-tuned version of openai/whisper-small on Khmer-language audio data. It is designed for automatic speech recognition (ASR) tasks in Khmer (αžαŸ’αž˜αŸ‚αžš).

Training Loss 5200 0.095900

🧠 Model Details

  • Base model: openai/whisper-tiny
  • Language: Khmer (km)
  • Task: Transcription (task="transcribe")
  • Fine-tuned on: Male voice Khmer audio dataset from tts-data-kh
  • Owner: songhieng

πŸš€ How to Use

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch

# Load processor and model
processor = WhisperProcessor.from_pretrained("songhieng/whisper-khmer-v1")
model = WhisperForConditionalGeneration.from_pretrained("songhieng/whisper-khmer-v1")
model.eval()

# Set to avoid forced decoder ID error
model.config.forced_decoder_ids = None
model.generation_config.forced_decoder_ids = None

# Load and resample audio
speech_array, sampling_rate = torchaudio.load("your_audio.wav")
if sampling_rate != 16000:
    resampler = torchaudio.transforms.Resample(sampling_rate, 16000)
    speech_array = resampler(speech_array)

# Extract features
input_features = processor(speech_array.squeeze(), return_tensors="pt").input_features

# Transcribe
with torch.no_grad():
    predicted_ids = model.generate(input_features)
    text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("πŸ“ Transcription:", text)
Downloads last month
3
Safetensors
Model size
37.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support