license: apache-2.0 | |
library_name: diffusers | |
pipeline_tag: text-to-image | |
# Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis | |
<div align="center"> | |
<img src="https://github.com/tang-bd/fuse-dit/blob/main/assets/visual.jpg?raw=true" width="95%"/> | |
</div> | |
## Resources | |
- [arXiv: Paper](https://arxiv.org/pdf/2505.10046) | |
- [GitHub: Code](https://github.com/tang-bd/fuse-dit) | |
## Quick Start | |
You can download the pre-trained model and then use `FuseDiTPipeline` in our codebase to run inference: | |
```python | |
import torch | |
from diffusion.pipelines import FuseDiTPipeline | |
pipeline = FuseDiTPipeline.from_pretrained("/path/to/pipeline/").to("cuda") | |
image = pipeline( | |
"your prompt", | |
width=512, | |
height=512, | |
num_inference_steps=25, | |
guidance_scale=6.0, | |
use_cache=True, | |
)[0][0] | |
image.save("test.png") | |
``` | |
## Citation | |
```bibtex | |
@article{tang2025exploringdeepfusion, | |
title={Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis}, | |
author={Bingda Tang and Boyang Zheng and Xichen Pan and Sayak Paul and Saining Xie}, | |
year={2025}, | |
journal={arXiv preprint arXiv:2505.10046}, | |
} | |
``` |