Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis Paper β’ 2505.09358 β’ Published 14 days ago β’ 24
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper β’ 2505.09568 β’ Published 14 days ago β’ 85
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper β’ 2505.07916 β’ Published 16 days ago β’ 118
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Paper β’ 2505.06111 β’ Published 19 days ago β’ 24
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Paper β’ 2505.05467 β’ Published 20 days ago β’ 13
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper β’ 2505.00703 β’ Published 27 days ago β’ 41
TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching Paper β’ 2505.00562 β’ Published 27 days ago β’ 3
Improving Editability in Image Generation with Layer-wise Memory Paper β’ 2505.01079 β’ Published 26 days ago β’ 28
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper β’ 2504.20438 β’ Published 29 days ago β’ 42
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Paper β’ 2504.21850 β’ Published 28 days ago β’ 26
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities Paper β’ 2504.20734 β’ Published 29 days ago β’ 61
YoChameleon: Personalized Vision and Language Generation Paper β’ 2504.20998 β’ Published 29 days ago β’ 11
Distilling semantically aware orders for autoregressive image generation Paper β’ 2504.17069 β’ Published Apr 23 β’ 6
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper β’ 2504.15921 β’ Published Apr 22 β’ 7
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Paper β’ 2504.17414 β’ Published Apr 24 β’ 17
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper β’ 2504.17343 β’ Published Apr 24 β’ 11
Boosting Generative Image Modeling via Joint Image-Feature Synthesis Paper β’ 2504.16064 β’ Published Apr 22 β’ 14