Will there be an ONNX version that is not quantized?

#4
by Maximum2000 - opened

Are you guys planning on releasing an ONNX version that is not quantized? Simillar to how we have GGUF version?

It's ok I generated my own.

Maximum2000 changed discussion status to closed

@Maximum2000 Can you please share the recipe for converting the ONNX model?

I used this doc: phi-4-multi-modal.md but this is set by default to be quantized four. If you want to not have it quantized follow this issue: Phi-4-multimodel-instruct ONNX fp32 fails to load.

Sign up or log in to comment