How to use the strict model with WhisperX?

#12
by tophee - opened

I'm not sure if I'm missing something or if it is simply not possible to use the strict revision of the model with WhisperX?

I have tried the following:

model = whisperx.load_model(
    "KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache", revision="strict"} 
)

But whisperX does not accept the revision parameter. Neither does it accept it via asr_options like so:

model = whisperx.load_model(
    "KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache", asr_options={"revision": "strict"} 
)

And adding @strict to the model name doesn't work either

model = whisperx.load_model(
    "KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache" 
)

This produces this error: "huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'KBLab/kb-whisper-large@strict'."

I also downloaded the following files from the strict branch

total 6040336
drwxr-xr-x@  11 myname  staff         352 Jun  5 11:07 .
drwx------@ 423 myname  staff       13536 Jun  5 11:07 ..
drwxr-xr-x    4 myname  staff         128 Jun  5 11:07 .cache
-rw-r--r--@   1 myname  staff        6148 Jun  5 11:12 .DS_Store
-rw-r--r--    1 myname  staff       34648 Jun  5 10:01 added_tokens.json
-rw-r--r--    1 myname  staff        1326 Jun  5 10:01 config.json
-rw-r--r--    1 myname  staff      493869 Jun  5 10:01 merges.txt
-rw-r--r--    1 myname  staff  3087284276 Jun  5 10:07 model.bin
-rw-r--r--    1 myname  staff       52666 Jun  5 10:01 normalizer.json
-rw-r--r--    1 myname  staff     3931383 Jun  5 10:01 tokenizer.json
-rw-r--r--    1 myname  staff      835528 Jun  5 10:01 vocab.json

and tried

model = whisperx.load_model(
   downloaded_model_path, device, compute_type=compute_type, download_root="cache" 
)

But then it complained "Cannot load the vocabulary from the model directory"

National Library of Sweden / KBLab org

We have not converted the non-standard versions of kb-whisper to formats used by external libraries.

You can do the following to convert our strict version to a WhisperX-compatible format:

import subprocess

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="KBLab/kb-whisper-large",
    revision="strict",
    local_dir="models/kb-whisper-large",
    local_dir_use_symlinks=False,
    allow_patterns=["*.safetensors", "*.json", "*.txt"],
)

# Install ctranslate2 if not already installed: `pip install ctranslate2`
# Run the following command in the terminal to convert the model to CTranslate2 format:
# ct2-transformers-converter --model models/kb-whisper-large --output_dir models/kb-whisper-large-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16

# Or alternatively, you can run the conversion using subprocess in Python:
subprocess.run(
    [
        "ct2-transformers-converter",
        "--model",
        "models/kb-whisper-large",
        "--output_dir",
        "models/kb-whisper-large-ct2",
        "--copy_files",
        "tokenizer.json",
        "preprocessor_config.json",
        "--quantization",
        "float16",
    ],
    check=True,
)

Then you should be able to load and use the converted model from the local output_dir you specified:

# Load the converted model
import whisperx

model = whisperx.load_model(
    "models/kb-whisper-large-ct2",
    device="cuda",
    compute_type="float16",
)

audio = whisperx.load_audio("data/audio.wav")
result = model.transcribe(audio, batch_size=8)

Sign up or log in to comment