Any chance of creating a v30 version to hold over until real support?

#1
by Todokete - opened

Title. :)

we are expect that

If you have an RTX 4000+ series GPU, you can achieve similar speed by using FP8 + torch.compile + sage attention, with added better quality (as the author of the current hack admitted that current quality is not so good and I can confirm that)

That's all well and good if you've got the hardware for it, but int4 is the only thing can give usable speeds on my 4GB 3050 laptop.

Owner

I think I'll do v36 when it's out - so probably toward the end of next week. Instructions here to create your own quant if you want:

https://huggingface.co/rocca/chroma-nunchaku-test/discussions/2#6843d2bba814d5f5dfb633df

Should only cost about $30 on runpod.

@rocca V36 is out, could you please create and share one for the detail-calibrated version?
https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v36-detail-calibrated.safetensors

Okay some v38/v38-detail-calibrated quants are cooking. If all goes well I'll upload them tomorrow.

Okay, bit of a delay but I've uploaded quants with a bunch of different settings. I'm not sure which one is best - please share your opinion here if you test multiple.

Note that "12-steps" and "32-steps" does not mean that you should use that many steps for inference. It just means that's how many steps were used for calibration. I found previously that sometimes calibrating with lower number of steps (and lower resolution) produced a more stable model, even when inferencing at a higher number of steps. Not sure if that holds true with v38 - again, share your experience in this thread to help others.

Also worth noting that we will likely get official nunchaku support soon, since diffusers support has now been merged: https://github.com/mit-han-lab/nunchaku/issues/167 Once official support lands, Chroma should easily run under 8GB VRAM (with no CPU offloading!) if using the new 4-bit AWQ T5 that Nunchaku provides.

I wholeheartedly want to express my gratitude to You for making nunchaku quants of chroma,
THANK YOU!

Thank you for your contribution. Is there any difference between the quantization effect of nunchaku FP4 and traditional FP8?

Sign up or log in to comment