Text-to-Image
Diffusers
Safetensors
StableDiffusionPipeline
stable-diffusion
Inference Endpoints

Results are only good when image size is 768x768

#22
by MazzzyStar - opened

512x512
e8df9c7e20906b23e3555595ade7a779.jpg
786x768 with same prompt
58a65448fe71885f2813fb38e75d4d4d.jpg

512x512
86ebdf7e450e752fae757bfeba02417d.jpg
786x768 with same prompt
29fd2494d114a8c13bcf3b824234b0c8.jpg

Even the 512x768 is not perfoming good when using 768.ckpt, can anyone tell me why?

That is because this model was trained to perform better when using 768x768, if you want to use 512x512 I suggest you use the base 512 model. (https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt)

@Ayitsmatt Thanks, I know that. But what if 512x768 or any other size proportion? Because I could only see the results of 768x768 on Twitter "stablediffusion2" hashtag, it made me felt maybe it's not so flexible or suitable for other size, I just want to confirm that.

deleted

Honestly im not sure, I just know that when using 768x768 the model seems to perform better.

@Ayitsmatt Thanks for your answer.

I don't know what you were using to interface with it, but I noticed that the Colabs that worked best with 1.4 and 1.5 all seemed to have some custom scripts for dealing with images over 512x512, for getting the AI to draw over its training set without starting a totally new image (which it still did, sometimes). I don't think anyone has done that for the new model yet. That's my working theory, anyway.

@maxspire Thanks for point out that. What made me confused is, when using 1.4 or 1.5, they are much better flexible on image size. Consider that you use an unregular input size for img2img, they often produce fairly resonable result . But for SD 2.0, I found this kind of "flexiblity" disappeared. It do not work well when the input size is not 768x768, that was my question.

Sign up or log in to comment