Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published 13 days ago • 9
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Paper • 2503.09595 • Published Mar 12
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 14 days ago • 85