XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

This repository contains the official model of the paper XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation.

Build Project Page Github Build

XVerse's capability in single/multi-subject personalization and semantic attribute control (pose, style, lighting)

Introduction

XVerse introduces a novel approach to multi-subject image synthesis, offering precise and independent control over individual subjects without disrupting the overall image latents or features. We achieve this by transforming reference images into offsets for token-specific text-stream modulation.

This innovation enables high-fidelity, editable image generation where you can robustly control both individual subject characteristics (identity) and their semantic attributes. XVerse significantly enhances capabilities for personalized and complex scene generation.

How to Use

see https://github.com/bytedance/XVerse

Where to send questions or comments about the model: https://github.com/bytedance/XVerse/issues

Citation

If XVerse is helpful, please help to โญ the repo.

If you find this project useful for your research, please consider citing our paper:

@article{chen2025xverse,
  title={XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation},
  author={Chen, Bowen and Zhao, Mengyi and Sun, Haomiao and Chen, Li and Wang, Xu and Du, Kang and Wu, Xinglong},
  journal={arXiv preprint arXiv:2506.21416},
  year={2025}
}
Downloads last month
1,223
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 8 Ask for provider support

Model tree for ByteDance/XVerse

Finetuned
(446)
this model

Spaces using ByteDance/XVerse 2