XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

This repository contains the official model of the paper XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation.

Introduction

XVerse introduces a novel approach to multi-subject image synthesis, offering precise and independent control over individual subjects without disrupting the overall image latents or features. We achieve this by transforming reference images into offsets for token-specific text-stream modulation.

This innovation enables high-fidelity, editable image generation where you can robustly control both individual subject characteristics (identity) and their semantic attributes. XVerse significantly enhances capabilities for personalized and complex scene generation.

How to Use

see https://github.com/bytedance/XVerse

Where to send questions or comments about the model: https://github.com/bytedance/XVerse/issues

Citation

If XVerse is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper:

@article{chen2025xverse,
  title={XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation},
  author={Chen, Bowen and Zhao, Mengyi and Sun, Haomiao and Chen, Li and Wang, Xu and Du, Kang and Wu, Xinglong},
  journal={arXiv preprint arXiv:2506.21416},
  year={2025}
}

ByteDance
/

XVerse

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Introduction

How to Use

Citation

Model tree for ByteDance/XVerse

Spaces using ByteDance/XVerse 2