Why q8 Quantization Appears Smaller

#12
by percisestretch - opened

There does appear to be an anomaly with model_q8f16.onnx (86 MB) being smaller than model_q4f16.onnx (154 MB) and even smaller than model_uint8.onnx (177 MB).

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment