LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published 2 days ago • 23
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Paper • 2505.17022 • Published 2 days ago • 24
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published 3 days ago • 41
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published 4 days ago • 52
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models Paper • 2505.16854 • Published 2 days ago • 9
LaViDa: A Large Diffusion Language Model for Multimodal Understanding Paper • 2505.16839 • Published 2 days ago • 10
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Paper • 2505.17012 • Published 2 days ago • 10
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published 4 days ago • 48
Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper • 2505.14357 • Published 4 days ago • 20
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Paper • 2505.14677 • Published 4 days ago • 14
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 5 items • Updated 3 days ago • 21
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published 7 days ago • 15
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published 17 days ago • 144
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging Paper • 2505.05464 • Published 16 days ago • 10
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published 11 days ago • 39