How to?

#1
by etemiz - opened

Hi,
Thanks for the model. How do I use it ?
whisper.load_model() does not work..

Hi,

whisper.load_model() only works with OpenAI's original models. For custom models like mine, use either:

  1. Transformers (easiest):

    from transformers import pipeline
    pipe = pipeline("automatic-speech-recognition", model="ysdede/whisper-khanacademy-large-v3-turbo-tr")
    pipe("audio.mp3")
    

    Example Notebook

  2. Faster Inference (CT2 backend):
    I recommend the optimized ct2 version for better speed:
    ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2
    Works with faster-whisper or whisperx

for whisperx simply:

whisperx audio.mp3 --model ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2

or usign a batch file:

@echo off
set "input_file=%~1"
set "output_file=%~dpn1.vtt"
set "output_dir=%~dp1"
if "%output_dir:~-1%"=="\" set "output_dir=%output_dir:~0,-1%"

echo Input file: %input_file%
echo Output file: %output_file%
echo Output directory: %output_dir%

whisperx.exe "%input_file%" --language tr --output_format vtt --compute_type int8 --model "ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2" --segment_resolution sentence --verbose True --batch_size 1 --print_progress True --output_dir "%output_dir%"

Important Note:
This is not superior to vanilla large-v3 for general Turkish - it's specifically fine-tuned to recognize Khan Academy's teaching style (educational terms, lecture cadence). For everyday speech, OpenAI's original may perform better.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment