Hacked up version of the ai-hub-apps repo to export this model
python .\export.py -target-runtime onnx --device "Snapdragon X Elite CRD" --skip-profiling --skip-inferencing
Patched: whisper/model.py
# The number of Mel features per audio context
# N_MELS = 80
# For Whisper V3 Turbo
N_MELS = 128
## COmmented out for now as we want to use it for Whisper V3 Turbo
# # Audio embedding length
# AUDIO_EMB_LEN = int(N_SAMPLES / N_MELS / 4)
# # Audio length per MEL feature
# MELS_AUDIO_LEN = AUDIO_EMB_LEN * 2
# Number of frames in the input mel spectrogram (e.g. 3000 for 30s audio at 160 hop_length).
# This corresponds to the 'n_frames' dimension of the mel spectrogram input to the Whisper AudioEncoder.
MELS_AUDIO_LEN = N_SAMPLES // HOP_LENGTH
# Length of the audio embedding from the encoder output (e.g. 1500).
# This corresponds to 'n_audio_ctx' in Whisper, which is MELS_AUDIO_LEN // 2
# due to the strided convolution in the encoder. This length is used for the
# cross-attention key/value cache from the encoder.
AUDIO_EMB_LEN = MELS_AUDIO_LEN // 2
WHISPER_VERSION = "large-v3-turbo"
# N_MELS_LARGE_V3_TURBO = 128
# DEFAULT_INPUT_SEQ_LEN = 3000
@CollectionModel.add_component(WhisperEncoderInf)
@CollectionModel.add_component(WhisperDecoderInf)
class WhisperV3Turbo(BaseWhisper):
@classmethod
def from_pretrained(cls):
return super().from_pretrained(WHISPER_VERSION)
You also need to patch this into the ai-hub library
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for bweng/whisper-v3-turbo-onnx-qnn
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo