q4f16 ONXX model issue
#5
by
mrniamster
- opened
When running the below , it gives the following error
const generator = await pipeline('text-generation','onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX',{dtype:"q4f16"});
An error occurred during model execution: "Error: Non-zero status code returned while running Cast node. Name:'InsertedPrecisionFreeCast_/model/layers.1/attn/v_proj/repeat_kv/Reshape_4/output_0' Status Message: D:\a\_work\1\s\onnxruntime\core\framework\op_kernel.cc:83 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,1536} != {1,12,1536}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
".
Indeed, that's a bug with the node.js implementation of onnxruntime (as it works correct in the browser).
You can use an earlier review (pre https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/commit/f9c94fd59ec97bdb5e7587d09343797481a8c385) to use the GQA variant of the model.
cc
@schmuell
odd, it works for me with onnxruntime-genai and the native webgpu ep.
What ep are you using? If it is cpu it is not going to like the fp16 and will cast to fp32 by inserting cast op into the graph. Wonder if something is going wrong there.