ooutlierr
/

fuse-dit

FuseDiTPipeline

Model card Files Files and versions Community

fuse-dit / README.md

nielsr's picture

nielsr HF Staff

Add library_name and pipeline_tag metadata

7fd01a5 verified 14 days ago

|

1.22 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: text-to-image
	---

	# Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

	<div align="center">
	<img src="https://github.com/tang-bd/fuse-dit/blob/main/assets/visual.jpg?raw=true" width="95%"/>
	</div>

	## Resources
	- [arXiv: Paper](https://arxiv.org/pdf/2505.10046)
	- [GitHub: Code](https://github.com/tang-bd/fuse-dit)

	## Quick Start
	You can download the pre-trained model and then use `FuseDiTPipeline` in our codebase to run inference:

	```python
	import torch
	from diffusion.pipelines import FuseDiTPipeline
	pipeline = FuseDiTPipeline.from_pretrained("/path/to/pipeline/").to("cuda")
	image = pipeline(
	"your prompt",
	width=512,
	height=512,
	num_inference_steps=25,
	guidance_scale=6.0,
	use_cache=True,
	)[0][0]
	image.save("test.png")
	```

	## Citation

	```bibtex
	@article{tang2025exploringdeepfusion,
	title={Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis},
	author={Bingda Tang and Boyang Zheng and Xichen Pan and Sayak Paul and Saining Xie},
	year={2025},
	journal={arXiv preprint arXiv:2505.10046},
	}
	```