Feedback and sample images

by concedo - opened 11 days ago

Discussion

concedo

11 days ago

Heya, in case you haven't seen, we've been playing around with Chroma on Stable-Diffusion.cpp, and noticed two flaws:

Generating at 512x512 produces significantly worse images, probably due to lack of training data at this resolution
Generating with short prompts e.g. (cat) seem to result in significantly worse images too.

Check out some sample images at https://github.com/leejet/stable-diffusion.cpp/pull/696

Hopefully some of these shortcomings can be addressed by injecting more diversity into the training data.

Cheers!

silveroxides

Owner 10 days ago

That has nothing to do with the model and everything to do with the implementation in the PR.
If issues this severe were standard then don't you think that you would have seen more examples from other places of this?
btw if t5 in stable-diffusion.cpp previously did not require specific total chunk length or attention mask then adding attention mask is unnecessary
attention mask is only needed if one uses transformers default which pads the t5 model to 512.
by simply not padding it to 512 you effectively remove that part
also empty negatives produces artifacts since the model is dedistilled schnell from start and bfl did not use attention mask. So you either use negative to a reasonable token count or you pad negative to a set minimum if negative is empty.
Also single token prompting on a model with unpadded t5 encoder and no CLIP is obviously out of scope.

stduhpf

10 days ago

•

edited 9 days ago

@silveroxides For clarification, the image on the left (the less broken one) was made using ComfyUI with city96's GGUF node (on cpu backend because Chroma crashes with directml for some reason). The one on the right was made using sdcpp with T5 masking. When disabling masking for t5 (which I now beleive to be the correct way to do it?), it looks like this:

Also single token prompting on a model with unpadded t5 encoder and no CLIP is obviously out of scope.

That does seem to be the case.

attention mask is only needed if one uses transformers default which pads the t5 model to 512.
by simply not padding it to 512 you effectively remove that part

I will look into that.

silveroxides

Owner 9 days ago

Also never leave negative empty. Better to pad negative than leave it empty

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment