What does "no_distill" mean exactly on the new v35 upload?

#8
by roe2 - opened

I see that one of the new v35 uploads has "no_distill" appended to the filename; what does that entail, exactly?

The _no_distill variant doesn't quantize the distilled_guidance_layer, final_layer, img_in, txt_in layers to FP8, and instead leaves them in their BF16 format with a scalar of 1.0 (no scaling).

So, does this mean it leads to better prompt attention, higher quality, or less compression?

The existing convert_fp8_scaled_stochastic.py does not handle it this way. It would be nice to release a script for this as well.

The existing convert_fp8_scaled_stochastic.py does not handle it this way. It would be nice to release a script for this as well.

Just made this repo for the new learned variant! I've found it to perform more akin to BF16 than the current stochastic implementation.

The existing convert_fp8_scaled_stochastic.py does not handle it this way. It would be nice to release a script for this as well.

Just made this repo for the new learned variant! I've found it to perform more akin to BF16 than the current stochastic implementation.

But when I look at the fp8 made by comfy, there will be some layers that are kept as bf16 precision, is this not needed in a DiT structured model?

Sign up or log in to comment