Any chance of creating a v30 version to hold over until real support?

by Todokete - opened May 21

Discussion

Todokete

May 21

Title. :)

MayerRZ

May 27

we are expect that

GTManiK

May 27

If you have an RTX 4000+ series GPU, you can achieve similar speed by using FP8 + torch.compile + sage attention, with added better quality (as the author of the current hack admitted that current quality is not so good and I can confirm that)

Todokete

May 27

That's all well and good if you've got the hardware for it, but int4 is the only thing can give usable speeds on my 4GB 3050 laptop.

rocca

Owner Jun 7

I think I'll do v36 when it's out - so probably toward the end of next week. Instructions here to create your own quant if you want:

https://huggingface.co/rocca/chroma-nunchaku-test/discussions/2#6843d2bba814d5f5dfb633df

Should only cost about $30 on runpod.

syddharth

Jun 14

@rocca V36 is out, could you please create and share one for the detail-calibrated version?
https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v36-detail-calibrated.safetensors

rocca

Owner Jun 20

Okay some v38/v38-detail-calibrated quants are cooking. If all goes well I'll upload them tomorrow.

rocca

Owner Jun 23

•

edited Jun 28

Okay, bit of a delay but I've uploaded quants with a bunch of different settings. I'm not sure which one is best - please share your opinion here if you test multiple.

Note that "12-steps" and "32-steps" does not mean that you should use that many steps for inference. It just means that's how many steps were used for calibration. I found previously that sometimes calibrating with lower number of steps (and lower resolution) produced a more stable model, even when inferencing at a higher number of steps. Not sure if that holds true with v38 - again, share your experience in this thread to help others.

Also worth noting that we will likely get official nunchaku support soon, since diffusers support has now been merged: https://github.com/mit-han-lab/nunchaku/issues/167 Once official support lands, Chroma should easily run under 8GB VRAM (with no CPU offloading!) if using the new 4-bit AWQ T5 that Nunchaku provides.

llama-anon

Jun 23

•

edited Jun 23

I wholeheartedly want to express my gratitude to You for making nunchaku quants of chroma,
THANK YOU!

MayerRZ

Jun 23

Thank you for your contribution. Is there any difference between the quantization effect of nunchaku FP4 and traditional FP8?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment