XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
This repository contains the official model of the paper XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation.
Introduction
XVerse introduces a novel approach to multi-subject image synthesis, offering precise and independent control over individual subjects without disrupting the overall image latents or features. We achieve this by transforming reference images into offsets for token-specific text-stream modulation.
This innovation enables high-fidelity, editable image generation where you can robustly control both individual subject characteristics (identity) and their semantic attributes. XVerse significantly enhances capabilities for personalized and complex scene generation.
How to Use
see https://github.com/bytedance/XVerse
Where to send questions or comments about the model: https://github.com/bytedance/XVerse/issues
Citation
If XVerse is helpful, please help to โญ the repo.
If you find this project useful for your research, please consider citing our paper:
@article{chen2025xverse,
title={XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation},
author={Chen, Bowen and Zhao, Mengyi and Sun, Haomiao and Chen, Li and Wang, Xu and Du, Kang and Wu, Xinglong},
journal={arXiv preprint arXiv:2506.21416},
year={2025}
}
- Downloads last month
- 1,223
Model tree for ByteDance/XVerse
Base model
black-forest-labs/FLUX.1-dev