Not-For-All-Audiences

Why does my image look like a mess at CFG: 5?

#82

by CSM360 - opened Jul 25

Jul 25

Basically when I set the CFG to ~5, which is the recommended value as far as I know, the image looks really messy and noisy. If you look at the buildings in the background, they look crooked and broken.

When I increase the CFG value to 12, the image becomes much more clear, the skyscrapers make sense and have proper walls and a logical structure, and the overall image quality becomes fantastic by comparison. The problem is that I get weird discoloration on the right side of the image.

What am I doing wrong? I'm linking my images as a reference, hopefully the metadata is not removed and you guys can load them into comfy.

I'm using a CONCAT node to merge two parts of a prompt, and I tried removing it to see if that was the problem, and it does get a bit more coherent, but it's still nowhere near as good as CFG: 12

CFG: 5, messy and broken buildings

CFG: 12, much better visual quality, but note the broken colors on the right

CFG: 5 without CONCAT node, more coherent, but nowhere near as good as CFG: 12

Clybius

Jul 25

•

edited Jul 25

Are you utilizing the model in BF16 format? If you aren't, that may help given you can fit it in your GPU. If you can't utilize BF16 or are using regular FP8, a scaled implementation of FP8 may also help instead of regular FP8.

Also, if you're using ComfyUI, you may also want to experiment around with CFGRenorm, CFGNorm, and/or the Tangential Damping CFG node. All of which are native to ComfyUI and may assist at those higher CFG scales.

Edit: checking metadata n stuff now, will edit again once I check 'em.

Edit 2: Another other thing I'd recommend is utilizing another sampler, likely dpmpp_2m or Gradient Estimation may be better at lower for you CFG scales. Euler is good and standard, but due to it practically being a simple lerp, tricks that the other samplers do could assist. Gradient Estimation is the closest to Euler (it's a lerp extrapolated beyond a weight of 1.0) and would provide a better experience I'd think. There's also the RL models (available in this repo or in Lode's debug repo, there may be more / newer ones soon) which provide a far more coherent image at lower CFG scales, in my opinion.

CSM360

Jul 26

Are you utilizing the model in BF16 format? If you aren't, that may help given you can fit it in your GPU. If you can't utilize BF16 or are using regular FP8, a scaled implementation of FP8 may also help instead of regular FP8.

By "using in BF16 format" do you mean the weight_dtype? I have it set to default. I have a 4070 Ti

I tried the various methods you mentioned, some of them do improve the coherence at CFG: 5, and the CFGNorm does remove the weird color at CFG: 12, but the image quality is nowhere close to what I get with regular CFG: 12

bp0

Jul 26

•

edited Jul 26

Hi @CSM360 ,
You may want to try the Skimmed CFG extension.
I didn't test it properly, but it allows setting super high CFG (like 32, 64, even 100) without burning / overcooking.
It should also allow you to you use the SDE versions of DPM++ samplers, but once again I don't have much experience with it.

bp0

Jul 26

•

edited Jul 26

@CSM360 , I did some quick tests for you:

Using Skimmed CFG's linear interpolation dual scales at 5.0 / 5.0 and a KSampler CFG at 12-96 seems to improve things a bit
- I didn't tweak the parameters, but it can be interesting
Removing CONCAT surely improves things indeed
What did improve things quite a lot: remove the word canyon from your prompt... it seems the model interprets this as a rock canyon (sorry I'm not a native English speaker).

Here what I got without Skimmed CFG, at CFG 5.0, without CONCAT and with canyon removed:

I hope it helps.

Edit: and here is the same, but this time with Skimmed CFG at 5.0 / 5.0 and a KSampler CFG at 32:

bp0

Jul 26

•

edited Jul 26

Another interesting finding: in your prompt aesthetic 10 could possibly be aesthetic:10. But, as someone wrote in another thread, it may be useful for photos, not for anime style (but I don't know myself, I only do photos).
Writing aesthetic:10 dramatically changes the image, even with the word canyon restored: no more broken buildings.

So here a 3rd image (no CONCAT, no Skimmed CFG, CFG 5.0, with canyon and aesthetic:10):

Now, if I remove aesthetic:10 and keep canyon, I get something not bad but a few issues with buildings..
Here is a 4th (and last) image, same as above with both aesthetic:10 and canyon removed:

It may not be the style you expect, though, but I hope you'll have some useful paths to experiment.

Edit: did some checks, I may be wrong with the aesthetic:xx writing, take it with a pinch of salt. But it surely influences results, and aesthetic:10 and aesthetic:3 do generate different images. TBH I never use it myself.
I seems you can also write aesthetic10, to be tested.

Dagdg

Jul 26

•

edited Jul 26

Another interesting finding: in your prompt aesthetic 10 could possibly be aesthetic:10. But, as someone wrote in another thread, it may be useful for photos, not for anime style (but I don't know myself, I only do photos).
Writing aesthetic:10 dramatically changes the image, even with the word canyon restored: no more broken buildings.

Aesthetic:10 and aesthetic 10 isn't much of a different thing. Read someone saying you can use : at the end of single words without () to still create emphasis (so "laugh:10" and "(laugh:10)" would be the same thing) but that seems to be false. It just interprets the : as a normal written letter/token/whathaveyou

CSM360

Jul 27

Thanks for the suggestions everyone! Some of them gave decent results, but ultimately none of the solutions matched the CFG: 12 image quality

So I did the dumbest solution imaginable. I'm just using CFG: 12 and cropping out the broken pixels

Hey if it works it works huh

seedmanc

Jul 29

You're lucky to have that, this is what I get for the same settings in Forge. I still can't figure out the regular version of Chroma, only the flash and fewer steps work okayish.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment