DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper β’ 2503.01183 β’ Published 7 days ago β’ 26
Tell me why: Visual foundation models as self-explainable classifiers Paper β’ 2502.19577 β’ Published 11 days ago β’ 10
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper β’ 2502.20307 β’ Published 10 days ago β’ 16
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper β’ 2502.20126 β’ Published 11 days ago β’ 19
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper β’ 2502.18137 β’ Published 13 days ago β’ 51
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper β’ 2502.18364 β’ Published 12 days ago β’ 32
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper β’ 2502.17363 β’ Published 13 days ago β’ 32
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper β’ 2502.14397 β’ Published 18 days ago β’ 38
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published 17 days ago β’ 128
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Paper β’ 2502.09411 β’ Published 25 days ago β’ 18
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper β’ 2502.10248 β’ Published 24 days ago β’ 51
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 20 days ago β’ 93
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Paper β’ 2502.12146 β’ Published 20 days ago β’ 16
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper β’ 2502.10458 β’ Published 26 days ago β’ 30