arxiv:2506.19713

Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales

Published on Jun 24

· Submitted by

msadat97 on Jun 25

Upvote

Authors:

Seyedmorteza Sadat ,

Romann M. Weber

Abstract

Frequency-decoupled guidance (FDG) enhances image quality and diversity by separately controlling low- and high-frequency guidance components in diffusion models, outperforming standard classifier-free guidance.

AI-generated summary

Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.

View arXiv page View PDF Add to collection

Community

msadat97

Paper author Paper submitter about 23 hours ago

TLDR: We show that applying classifier-free guidance in the frequency domain substantially improves the quality at low guidance scales, while inherently avoiding the shortcomings associated with high guidance values.

abfauhwf

about 7 hours ago

Gave it a shot, and this has massively improved the sampling quality I can get out of v-pred models, since I can use a very low low-frequency CFG scale and still get good details!

Really good idea overall, very surprised I haven't seen something of this kind until now.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.19713 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.19713 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.19713 in a Space README.md to link it from this page.