arxiv:2506.03621

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

Published on Jun 4

· Submitted by

chaehun on Jun 4

Upvote

Authors:

Chaehun Shin ,

Jooyoung Choi ,

Abstract

Subject Fidelity Optimization (SFO) enhances zero-shot subject-driven generation by introducing synthetic negative samples and optimizing diffusion timesteps, outperforming existing methods in subject fidelity and text alignment.

AI-generated summary

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Beyond supervised fine-tuning methods that rely only on positive targets and use the diffusion loss as in the pre-training stage, SFO introduces synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically generates distinctive and informative negatives by intentionally degrading visual and textual cues without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus finetuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/

View arXiv page View PDF Project page Add to collection

Community

chaehun

Paper author Paper submitter 2 days ago

We introduce Subject Fidelity Optimization (SFO) which enhances subject fidelity in zero-shot subject-driven text-to-image generation by introducing negative targets and a comparison-based learning signal, explicitly guiding the model on which aspects are desirable and which are not.