|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis |
|
|
|
<div align="center"> |
|
<img src="https://github.com/tang-bd/fuse-dit/blob/main/assets/visual.jpg?raw=true" width="95%"/> |
|
</div> |
|
|
|
## Resources |
|
- [arXiv: Paper](https://arxiv.org/pdf/2505.10046) |
|
- [GitHub: Code](https://github.com/tang-bd/fuse-dit) |
|
|
|
## Quick Start |
|
You can download the pre-trained model and then use `FuseDiTPipeline` in our codebase to run inference: |
|
|
|
```python |
|
import torch |
|
from diffusion.pipelines import FuseDiTPipeline |
|
pipeline = FuseDiTPipeline.from_pretrained("/path/to/pipeline/").to("cuda") |
|
image = pipeline( |
|
"your prompt", |
|
width=512, |
|
height=512, |
|
num_inference_steps=25, |
|
guidance_scale=6.0, |
|
use_cache=True, |
|
)[0][0] |
|
image.save("test.png") |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{tang2025exploringdeepfusion, |
|
title={Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis}, |
|
author={Bingda Tang and Boyang Zheng and Xichen Pan and Sayak Paul and Saining Xie}, |
|
year={2025}, |
|
journal={arXiv preprint arXiv:2505.10046}, |
|
} |
|
``` |