Abstract
Recent large-scale diffusion models generate high-quality images but struggle to learn new, personalized artistic styles, which limits the creation of unique style templates. Fine-tuning with reference images is the most promising approach, but it often blindly utilizes objectives and noise level distributions used for pre-training, leading to suboptimal style alignment. We propose the Style-friendly SNR sampler, which aggressively shifts the signal-to-noise ratio (SNR) distribution toward higher noise levels during fine-tuning to focus on noise levels where stylistic features emerge. This enables models to better capture unique styles and generate images with higher style alignment. Our method allows diffusion models to learn and share new "style templates", enhancing personalized content creation. We demonstrate the ability to generate styles such as personal watercolor paintings, minimal flat cartoons, 3D renderings, multi-panel images, and memes with text, thereby broadening the scope of style-driven generation.
Community
We propose a Style-friendly sampler that shifts the diffusion fine-tuning toward higher noise levels, enabling FLUX and SD3.5 to effectively learn new, unique artistic styles and expand the scope of style-driven generation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Stylecodes: Encoding Stylistic Information For Image Generation (2024)
- Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning (2024)
- StyleTex: Style Image-Guided Texture Generation for 3D Models (2024)
- Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution (2024)
- DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer (2024)
- Bridging Text and Image for Artist Style Transfer via Contrastive Learning (2024)
- Using Style Ambiguity Loss to Improve Aesthetics of Diffusion Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
We propose a Style-friendly sampler that shifts the diffusion fine-tuning toward higher noise levels, enabling FLUX and SD3.5 to effectively learn new, unique artistic styles and expand the scope of style-driven generation.
"We fine-tune both FLUX-dev [29] and SD3.5 [8, 57] by training LoRA [19] adapters on specific layers to capture new styles. "
The specific layers are not found in the paper, can you tell the names of the relevant specific layers?
I’m sorry for the late reply.
In the relevant paragraph, we mention that we train LoRA adapters on the attention layers that handle text tokens as well as the attention layers that handle image tokens in the dual-stream transformer blocks of MM-DiT. For more details, please refer to Figure S12 in the supplementary material, where we briefly show how the LoRA modules are integrated.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper