Xudong Xu

Sheldoooon

https://sheldontsui.github.io/

SheldonTsui

AI & ML interests

AIGC for Embodied AI

Recent Activity

upvoted a paper 20 days ago

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

upvoted a paper 4 months ago

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

upvoted a paper 4 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

View all activity

Organizations

upvoted a paper 20 days ago

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Paper • 2601.05241 • Published 20 days ago • 24

upvoted 11 papers 4 months ago

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Paper • 2509.22281 • Published Sep 26, 2025 • 33

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 140

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Paper • 2509.20414 • Published Sep 24, 2025 • 10

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Paper • 2509.21245 • Published Sep 25, 2025 • 39

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Paper • 2509.20358 • Published Sep 24, 2025 • 15

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 100

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23, 2025 • 30

Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 146

Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Paper • 2509.12815 • Published Sep 16, 2025 • 40

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Paper • 2509.10813 • Published Sep 13, 2025 • 31

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15, 2025 • 106

upvoted 8 papers 5 months ago

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 77

Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35

FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Paper • 2508.19188 • Published Aug 26, 2025 • 17

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 212

Xudong Xu

AI & ML interests

Recent Activity

Organizations

Sheldoooon's activity