Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Paper • 2507.05963 • Published 4 days ago • 9
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published 5 days ago • 40
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper • 2507.06165 • Published 4 days ago • 49
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published 4 days ago • 40
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published 3 days ago • 47
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Paper • 2507.07136 • Published 4 days ago • 21
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published 2 days ago • 26
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time Paper • 2506.18890 • Published 19 days ago • 6
Auto-Regressively Generating Multi-View Consistent Images Paper • 2506.18527 • Published 19 days ago • 8
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation Paper • 2506.18839 • Published 24 days ago • 10
3D Arena: An Open Platform for Generative 3D Evaluation Paper • 2506.18787 • Published 19 days ago • 12
DIP: Unsupervised Dense In-Context Post-training of Visual Representations Paper • 2506.18463 • Published 19 days ago • 21
ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs Paper • 2506.18792 • Published 19 days ago • 29
OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published 19 days ago • 72
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper • 2506.18882 • Published 19 days ago • 84