arxiv:2505.18445

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Published on May 24

· Submitted by

yiren98 on May 28

#3 Paper of the day

Upvote

Authors:

Yiren Song ,

Mike Zheng Shou

Abstract

OmniConsistency, using large-scale Diffusion Transformers, enhances stylization consistency and generalization in image-to-image pipelines without style degradation.

AI-generated summary

Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose OmniConsistency, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

View arXiv page View PDF Add to collection

Community

yiren98

Paper author Paper submitter 1 day ago

🚀 Open-source breakthrough!
We recreated GPT-4o-level stylization consistency using only 2,600 pairs + 500 GPU hours!

Introducing OmniConsistency:
⚡ Super strong style + content consistency
⚡ Plug-and-play, works with any Flux LoRA
⚡ Lightweight, rivals top commercial APIs

Demo 👉 https://huggingface.co/spaces/yiren98/OmniConsistency
Code 👉 https://github.com/showlab/OmniConsistency