ONNX version of whisper-large-v3-onnx-w8a16-dynamic

This repository contains the ONNX version of the openai/whisper-large-v3 model.

Model Details

The original model can be found here: openai/whisper-large-v3

Quantization

This model has been quantized to w8a16 using dynamic quantization. This reduces the model size and can improve inference speed, especially on CPUs.

Usage

The model can be used with optimum.onnxruntime.ORTModelForSpeechSeq2Seq.

from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import WhisperProcessor

model_name = "mirekphd/whisper-large-v3-onnx-w8a16-dynamic"
processor = WhisperProcessor.from_pretrained(model_name)
model = ORTModelForSpeechSeq2Seq.from_pretrained(model_name)

# ... add your inference code here ...