--- license: apache-2.0 library_name: diffusers pipeline_tag: text-to-image --- # Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

## Resources - [arXiv: Paper](https://arxiv.org/pdf/2505.10046) - [GitHub: Code](https://github.com/tang-bd/fuse-dit) ## Quick Start You can download the pre-trained model and then use `FuseDiTPipeline` in our codebase to run inference: ```python import torch from diffusion.pipelines import FuseDiTPipeline pipeline = FuseDiTPipeline.from_pretrained("/path/to/pipeline/").to("cuda") image = pipeline( "your prompt", width=512, height=512, num_inference_steps=25, guidance_scale=6.0, use_cache=True, )[0][0] image.save("test.png") ``` ## Citation ```bibtex @article{tang2025exploringdeepfusion, title={Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis}, author={Bingda Tang and Boyang Zheng and Xichen Pan and Sayak Paul and Saining Xie}, year={2025}, journal={arXiv preprint arXiv:2505.10046}, } ```