Papers
arxiv:2507.13984

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Published on Jul 18
· Submitted by nqbinh on Jul 21
Authors:
,
,
,

Abstract

CSD-VAR, a Visual Autoregressive Modeling approach, enhances content-style decomposition by introducing scale-aware optimization, SVD-based rectification, and augmented K-V memory, outperforming diffusion models in content preservation and stylization.

AI-generated summary

Disentangling content and style from a single image, known as content-style decomposition (CSD), enables recontextualization of extracted content and stylization of extracted styles, offering greater creative flexibility in visual synthesis. While recent personalization methods have explored the decomposition of explicit content style, they remain tailored for diffusion models. Meanwhile, Visual Autoregressive Modeling (VAR) has emerged as a promising alternative with a next-scale prediction paradigm, achieving performance comparable to that of diffusion models. In this paper, we explore VAR as a generative framework for CSD, leveraging its scale-wise generation process for improved disentanglement. To this end, we propose CSD-VAR, a novel method that introduces three key innovations: (1) a scale-aware alternating optimization strategy that aligns content and style representation with their respective scales to enhance separation, (2) an SVD-based rectification method to mitigate content leakage into style representations, and (3) an Augmented Key-Value (K-V) memory enhancing content identity preservation. To benchmark this task, we introduce CSD-100, a dataset specifically designed for content-style decomposition, featuring diverse subjects rendered in various artistic styles. Experiments demonstrate that CSD-VAR outperforms prior approaches, achieving superior content preservation and stylization fidelity.

Community

Paper author Paper submitter
This comment has been hidden
Paper author Paper submitter

A novel content-style personalization method for Visual Autoregressive Models (Scale-Wise AR)

Paper author Paper submitter
edited 4 days ago

We explored the inherent scale bias of VAR to enable more efficient content-style decomposition, outperforming diffusion-based methods

Really cool paper!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.13984 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.13984 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.13984 in a Space README.md to link it from this page.

Collections including this paper 1