Learning Video Representations without Natural Videos Paper • 2410.24213 • Published Oct 31, 2024 • 16
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published 6 days ago • 15
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Paper • 2504.07962 • Published Apr 10