MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing Paper • 2502.21291 • Published 9 days ago • 4 • 2
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 11 days ago • 26 • 3
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper • 2502.13995 • Published 19 days ago • 8 • 2
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer Paper • 2502.05979 • Published 28 days ago • 8 • 2
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published Jan 23 • 7 • 2
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published Jan 23 • 37 • 2
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8 • 14 • 2
Multi-subject Open-set Personalization in Video Generation Paper • 2501.06187 • Published Jan 10 • 14 • 2
Ingredients: Blending Custom Photos with Video Diffusion Transformers Paper • 2501.01790 • Published Jan 3 • 8 • 2
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published Dec 27, 2024 • 13 • 2