Will there be an ONNX version that is not quantized?
#4
by
Maximum2000
- opened
Are you guys planning on releasing an ONNX version that is not quantized? Simillar to how we have GGUF version?
It's ok I generated my own.
Maximum2000
changed discussion status to
closed
@Maximum2000 Can you please share the recipe for converting the ONNX model?
I used this doc: phi-4-multi-modal.md but this is set by default to be quantized four. If you want to not have it quantized follow this issue: Phi-4-multimodel-instruct ONNX fp32 fails to load.