More details about the quantization process?

by mingyi456 - opened Sep 15, 2025

Sep 15, 2025

•

edited Sep 15, 2025

Hi, are there any links or documentation regarding the specific method of quantization you used? I am rather curious about why this model seems much more sensitive to quantization compared to others.

Also, can I ask how does a standard BF16/FP16 "quantization", and your Q8_0 GGUF quant compare to the full FP32 precision results, and your refined FP8 quant?

wikeeyang

Owner Sep 15, 2025

Hello Dr. Chen , actually this problem is not complicated. It should be a slight difference in the model key-value format (keys) similar to diffusers core and ComfyUI core. The official Tencent-SRPO provides a standard Flux.1-Dev model "transformer" format, which actually has some differences in keys compared to the "flux1-dev.safetensors" file in the root directory of the Flux official repo, and you can take a look at the keys. ComfyUI can recognize it when loading models, but there may be some small bugs if it does fp8 (e4m3/e5m3) quantization directly during loading, as there isn't better compatibility with the keys. This is not an issue in diffusers, so directly using the diffusers script to read the model and quantize should work fine.

Another, As for my v1 compared to the official model, the differences are very slight; I just personally prefer images that are more realistic and high-definition. SRPO is indeed quite good in terms of realism and prompt word restoration! My Refine v1 only enhances clarity a bit, especially in complex detail textures, it performs better in aspects like flowers, grass, and the knitted textures of clothing.

Below is a compare chart let us can see the differences more intuitively:

mingyi456

Sep 15, 2025

It should be "Mr. Chen" and not "Dr", I am not qualified to have such a title, and I am basically just a hobbyist in this space (which I guess you are as well), and a rather inexperienced one in fact. I am just interested in quantization techniques and I thought your model description was referring to a new fancy technique or something.
So what you are saying is that the faulty fp8 quants are actually caused by bugs due to differences in the naming of the keys of the state_dict (diffusers format vs comfyui)? And your fp8 quant is basically a slight finetune on top of the stock SPRO model, as well as fixing the quantization bugs?

wikeeyang

Owner Sep 15, 2025

Hi, Mr. Chen, I guess it can be understood like so so...

mingyi456 changed discussion status to closed Sep 15, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment