Why q8 Quantization Appears Smaller

#12

by percisestretch - opened 28 days ago

28 days ago

There does appear to be an anomaly with model_q8f16.onnx (86 MB) being smaller than model_q4f16.onnx (154 MB) and even smaller than model_uint8.onnx (177 MB).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment