MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing Paper β’ 2502.21291 β’ Published 9 days ago β’ 4
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper β’ 2502.20172 β’ Published 11 days ago β’ 26
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper β’ 2502.13995 β’ Published 19 days ago β’ 8
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper β’ 2502.10248 β’ Published 24 days ago β’ 51
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer Paper β’ 2502.05979 β’ Published 28 days ago β’ 8
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper β’ 2501.13452 β’ Published Jan 23 β’ 7
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper β’ 2501.13926 β’ Published Jan 23 β’ 37
Multi-subject Open-set Personalization in Video Generation Paper β’ 2501.06187 β’ Published Jan 10 β’ 14
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper β’ 2501.04698 β’ Published Jan 8 β’ 14
Ingredients: Blending Custom Photos with Video Diffusion Transformers Paper β’ 2501.01790 β’ Published Jan 3 β’ 8
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper β’ 2412.19645 β’ Published Dec 27, 2024 β’ 13
Mind the Time: Temporally-Controlled Multi-Event Video Generation Paper β’ 2412.05263 β’ Published Dec 6, 2024 β’ 11
π¬ Video models Collection text-to-video & image-to-video models released by the Chinese community β’ 22 items β’ Updated 5 days ago β’ 4
Trending Papers - November β¨ Collection Most upvoted paper on the Daily Papers β’ 10 items β’ Updated 5 days ago β’ 3
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle Paper β’ 2407.19548 β’ Published Jul 28, 2024 β’ 26
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis Paper β’ 2409.02048 β’ Published Sep 3, 2024 β’ 3
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper β’ 2412.02259 β’ Published Dec 3, 2024 β’ 58
Open-Sora Plan: Open-Source Large Video Generation Model Paper β’ 2412.00131 β’ Published Nov 28, 2024 β’ 33
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper β’ 2411.17459 β’ Published Nov 26, 2024 β’ 11