Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation Paper • 2502.20388 • Published 10 days ago • 14
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 10 days ago • 19
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 9 days ago • 25
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation Paper • 2503.01739 • Published 6 days ago • 6
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 26 days ago • 34
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Paper • 2502.06145 • Published 27 days ago • 16
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Paper • 2502.08590 • Published 25 days ago • 40
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 186
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 17
Fast Encoder-Based 3D from Casual Videos via Point Track Processing Paper • 2404.07097 • Published Apr 10, 2024 • 4
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Paper • 2501.03931 • Published Jan 7 • 15
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 43
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control Paper • 2412.20800 • Published Dec 30, 2024 • 10
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 51