ONNX version of whisper-large-v3-onnx-w8a16-dynamic
This repository contains the ONNX version of the openai/whisper-large-v3
model.
Model Details
The original model can be found here: openai/whisper-large-v3
Quantization
This model has been quantized to w8a16 using dynamic quantization. This reduces the model size and can improve inference speed, especially on CPUs.
Usage
The model can be used with optimum.onnxruntime.ORTModelForSpeechSeq2Seq
.
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import WhisperProcessor
model_name = "mirekphd/whisper-large-v3-onnx-w8a16-dynamic"
processor = WhisperProcessor.from_pretrained(model_name)
model = ORTModelForSpeechSeq2Seq.from_pretrained(model_name)
# ... add your inference code here ...
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support