How to use the strict model with WhisperX?
I'm not sure if I'm missing something or if it is simply not possible to use the strict revision of the model with WhisperX?
I have tried the following:
model = whisperx.load_model(
"KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache", revision="strict"}
)
But whisperX does not accept the revision
parameter. Neither does it accept it via asr_options
like so:
model = whisperx.load_model(
"KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache", asr_options={"revision": "strict"}
)
And adding @strict
to the model name doesn't work either
model = whisperx.load_model(
"KBLab/kb-whisper-large@strict", device, compute_type=compute_type, download_root="cache"
)
This produces this error: "huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'KBLab/kb-whisper-large@strict'."
I also downloaded the following files from the strict branch
total 6040336
drwxr-xr-x@ 11 myname staff 352 Jun 5 11:07 .
drwx------@ 423 myname staff 13536 Jun 5 11:07 ..
drwxr-xr-x 4 myname staff 128 Jun 5 11:07 .cache
-rw-r--r--@ 1 myname staff 6148 Jun 5 11:12 .DS_Store
-rw-r--r-- 1 myname staff 34648 Jun 5 10:01 added_tokens.json
-rw-r--r-- 1 myname staff 1326 Jun 5 10:01 config.json
-rw-r--r-- 1 myname staff 493869 Jun 5 10:01 merges.txt
-rw-r--r-- 1 myname staff 3087284276 Jun 5 10:07 model.bin
-rw-r--r-- 1 myname staff 52666 Jun 5 10:01 normalizer.json
-rw-r--r-- 1 myname staff 3931383 Jun 5 10:01 tokenizer.json
-rw-r--r-- 1 myname staff 835528 Jun 5 10:01 vocab.json
and tried
model = whisperx.load_model(
downloaded_model_path, device, compute_type=compute_type, download_root="cache"
)
But then it complained "Cannot load the vocabulary from the model directory"
We have not converted the non-standard versions of kb-whisper to formats used by external libraries.
You can do the following to convert our strict
version to a WhisperX-compatible format:
import subprocess
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="KBLab/kb-whisper-large",
revision="strict",
local_dir="models/kb-whisper-large",
local_dir_use_symlinks=False,
allow_patterns=["*.safetensors", "*.json", "*.txt"],
)
# Install ctranslate2 if not already installed: `pip install ctranslate2`
# Run the following command in the terminal to convert the model to CTranslate2 format:
# ct2-transformers-converter --model models/kb-whisper-large --output_dir models/kb-whisper-large-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16
# Or alternatively, you can run the conversion using subprocess in Python:
subprocess.run(
[
"ct2-transformers-converter",
"--model",
"models/kb-whisper-large",
"--output_dir",
"models/kb-whisper-large-ct2",
"--copy_files",
"tokenizer.json",
"preprocessor_config.json",
"--quantization",
"float16",
],
check=True,
)
Then you should be able to load and use the converted model from the local output_dir
you specified:
# Load the converted model
import whisperx
model = whisperx.load_model(
"models/kb-whisper-large-ct2",
device="cuda",
compute_type="float16",
)
audio = whisperx.load_audio("data/audio.wav")
result = model.transcribe(audio, batch_size=8)