SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper β’ 2507.15852 β’ Published 29 days ago β’ 38
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second Paper β’ 2507.10065 β’ Published Jul 14 β’ 24
MOSPA: Human Motion Generation Driven by Spatial Audio Paper β’ 2507.11949 β’ Published Jul 16 β’ 23
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Paper β’ 2507.12956 β’ Published Jul 17 β’ 24
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper β’ 2506.17218 β’ Published Jun 20 β’ 27
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper β’ 2506.08343 β’ Published Jun 10 β’ 49
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper β’ 2506.13642 β’ Published Jun 16 β’ 27
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper β’ 2506.07177 β’ Published Jun 8 β’ 22
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 28
view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? By Kseniase β’ Mar 17 β’ 328
CVPR 2025 Collection A collection of models and demos linked to papers presented at CVPR 2025. β’ 14 items β’ Updated Jun 11 β’ 1
Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery Paper β’ 2506.05673 β’ Published Jun 6 β’ 10
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper β’ 2506.05573 β’ Published Jun 5 β’ 78
SpatialLM: Training Large Language Models for Structured Indoor Modeling Paper β’ 2506.07491 β’ Published Jun 9 β’ 48