q4f16 ONXX model issue

by mrniamster - opened Jan 28

ONNX Community org Jan 28

When running the below , it gives the following error

const generator = await pipeline('text-generation','onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX',{dtype:"q4f16"});

An error occurred during model execution: "Error: Non-zero status code returned while running Cast node. Name:'InsertedPrecisionFreeCast_/model/layers.1/attn/v_proj/repeat_kv/Reshape_4/output_0' Status Message: D:\a\_work\1\s\onnxruntime\core\framework\op_kernel.cc:83 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,1536} != {1,12,1536}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
".

Xenova

ONNX Community org Jan 28

Indeed, that's a bug with the node.js implementation of onnxruntime (as it works correct in the browser).

You can use an earlier review (pre https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/commit/f9c94fd59ec97bdb5e7587d09343797481a8c385) to use the GQA variant of the model.
cc @schmuell

schmuell

ONNX Community org Jan 29

odd, it works for me with onnxruntime-genai and the native webgpu ep.
What ep are you using? If it is cpu it is not going to like the fp16 and will cast to fp32 by inserting cast op into the graph. Wonder if something is going wrong there.

Xenova

ONNX Community org Feb 3

Indeed, running on CPU causes this issue.

Others are experiencing this issue here: https://github.com/microsoft/onnxruntime-genai/issues/1098

vlad-wonderkid

Feb 21

Indeed, that's a bug with the node.js implementation of onnxruntime (as it works correct in the browser).

You can use an earlier review (pre https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/commit/f9c94fd59ec97bdb5e7587d09343797481a8c385) to use the GQA variant of the model.
cc @schmuell

Hello. Could you please advice how one can use an earlier review of the model?
Should it be a parameter to pipeline or something?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment