OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Abstract
OmniConsistency, using large-scale Diffusion Transformers, enhances stylization consistency and generalization in image-to-image pipelines without style degradation.
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose OmniConsistency, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.
Community
๐ Open-source breakthrough!
We recreated GPT-4o-level stylization consistency using only 2,600 pairs + 500 GPU hours!
Introducing OmniConsistency:
โก Super strong style + content consistency
โก Plug-and-play, works with any Flux LoRA
โก Lightweight, rivals top commercial APIs
Demo ๐ https://huggingface.co/spaces/yiren98/OmniConsistency
Code ๐ https://github.com/showlab/OmniConsistency
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ICAS: IP Adapter and ControlNet-based Attention Structure for Multi-Subject Style Transfer Optimization (2025)
- LLM-Enabled Style and Content Regularization for Personalized Text-to-Image Generation (2025)
- InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework (2025)
- OmniStyle: Filtering High Quality Style Transfer Data at Scale (2025)
- StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation (2025)
- A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model (2025)
- FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper