whisper-tiny-khmer-mlx-fp16

This model was converted to MLX format from openai-whisper-tiny, then fine-tined to Khmer language using two datasets:

seanghay/khmer_mpwt_speech
seanghay/km-speech-corpus

It achieves the following word error rate (wer) on 2 popular datasets:

80.2% on test split of google/fleurs km-kh
63.2% on train split of openslr/openslr SLR42

NOTE MLX format is usable for M-chip series of Apple.

Use with mlx

pip install mlx-whisper

Write a python script, example.py, as the following

import mlx_whisper

result = mlx_whisper.transcribe(
    SPEECH_FILE_NAME,
    path_or_hf_repo="Kimang18/whisper-tiny-khmer-mlx-fp16",
    fp16=True
)
print(result['text'])

Then execute this script example.py to see the result.

You can also use command line in terminal

mlx_whisper --model Kimang18/whisper-tiny-khmer-mlx-fp16 --task transcribe SPEECH_FILE_NAME --fp16 True

Kimang18
/

whisper-tiny-khmer-mlx-fp16

whisper-tiny-khmer-mlx-fp16

Use with mlx

Dataset used to train Kimang18/whisper-tiny-khmer-mlx-fp16

Evaluation results