MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published 13 days ago • 13
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17 • 17
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published 27 days ago • 21
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Paper • 2505.04410 • Published May 7 • 44
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 176
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm Paper • 2502.02358 • Published Feb 4 • 18
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model Paper • 2312.17240 • Published Dec 28, 2023 • 1