InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 1 day ago • 74
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 2 days ago • 23
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 15 days ago • 42
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 21 days ago • 59
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 30 days ago • 116
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published Dec 5, 2025 • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 68
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published Dec 3, 2025 • 16
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 245
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 96
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 202
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211