One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Paper • 2501.13554 • Published 5 days ago • 8
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 5 days ago • 27
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 6 days ago • 71
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 6 days ago • 245
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 14 days ago • 19
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 7 days ago • 27
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published 13 days ago • 10
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Paper • 2501.08225 • Published 14 days ago • 18
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 14 days ago • 55
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 18 days ago • 59
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 21 days ago • 48
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 19 items • Updated 21 days ago • 87
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published 21 days ago • 23
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 22 days ago • 67
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 22 days ago • 52
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published 22 days ago • 23
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos Paper • 2412.09401 • Published Dec 12, 2024 • 2
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published 26 days ago • 11
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration Paper • 2501.01320 • Published 26 days ago • 11