FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 10 days ago • 21
MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds Paper • 2512.21003 • Published Dec 24, 2025 • 2
Sharp Monocular View Synthesis in Less Than a Second Paper • 2512.10685 • Published Dec 11, 2025 • 28
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14, 2025 • 51