Haoyu Guo's picture

71 3

Haoyu Guo

ghy0324

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

upvoted a paper 2 days ago

VINO: A Unified Visual Generator with Interleaved OmniModal Context

upvoted a paper 15 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

View all activity

Organizations

upvoted a paper 1 day ago

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Paper • 2601.03252 • Published 1 day ago • 74

upvoted a paper 2 days ago

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Paper • 2601.02358 • Published 2 days ago • 23

upvoted a paper 15 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published 15 days ago • 42

upvoted 2 papers 21 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 21 days ago • 59

Step-GUI Technical Report

Paper • 2512.15431 • Published 22 days ago • 129

upvoted a paper 23 days ago

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 30 days ago • 116

upvoted 6 papers about 1 month ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published Dec 5, 2025 • 38

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 68

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Paper • 2512.03746 • Published Dec 3, 2025 • 16

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 32

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 149

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 245

upvoted 8 papers about 2 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 92

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 126

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 110

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 96

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 202

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 37

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211