Feedback and sample images

#8
by concedo - opened

Heya, in case you haven't seen, we've been playing around with Chroma on Stable-Diffusion.cpp, and noticed two flaws:

  1. Generating at 512x512 produces significantly worse images, probably due to lack of training data at this resolution
  2. Generating with short prompts e.g. (cat) seem to result in significantly worse images too.

Check out some sample images at https://github.com/leejet/stable-diffusion.cpp/pull/696

image.png

Hopefully some of these shortcomings can be addressed by injecting more diversity into the training data.

Cheers!

That has nothing to do with the model and everything to do with the implementation in the PR.
If issues this severe were standard then don't you think that you would have seen more examples from other places of this?
btw if t5 in stable-diffusion.cpp previously did not require specific total chunk length or attention mask then adding attention mask is unnecessary
attention mask is only needed if one uses transformers default which pads the t5 model to 512.
by simply not padding it to 512 you effectively remove that part
also empty negatives produces artifacts since the model is dedistilled schnell from start and bfl did not use attention mask. So you either use negative to a reasonable token count or you pad negative to a set minimum if negative is empty.
Also single token prompting on a model with unpadded t5 encoder and no CLIP is obviously out of scope.

@silveroxides For clarification, the image on the left (the less broken one) was made using ComfyUI with city96's GGUF node (on cpu backend because Chroma crashes with directml for some reason). The one on the right was made using sdcpp with T5 masking. When disabling masking for t5 (which I now beleive to be the correct way to do it?), it looks like this:
image.png

Also single token prompting on a model with unpadded t5 encoder and no CLIP is obviously out of scope.

That does seem to be the case.

attention mask is only needed if one uses transformers default which pads the t5 model to 512.
by simply not padding it to 512 you effectively remove that part

I will look into that.

Also never leave negative empty. Better to pad negative than leave it empty

Sign up or log in to comment