Why q8 Quantization Appears Smaller
#12
by
percisestretch
- opened
There does appear to be an anomaly with model_q8f16.onnx (86 MB) being smaller than model_q4f16.onnx (154 MB) and even smaller than model_uint8.onnx (177 MB).