4 18 3

Xuehui Wang

huiserwang

https://huiserwang.site

huiserwang

AI & ML interests

Segmentation

Recent Activity

upvoted a paper about 1 month ago

Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

upvoted a paper about 1 month ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

upvoted a paper 4 months ago

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

View all activity

Organizations

upvoted 2 papers about 1 month ago

Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

Paper • 2603.09896 • Published Mar 10 • 27

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published Mar 10 • 48

upvoted a paper 4 months ago

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Paper • 2512.08478 • Published Dec 9, 2025 • 77

upvoted a paper 5 months ago

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Paper • 2511.11793 • Published Nov 14, 2025 • 195

upvoted 2 papers 6 months ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 110

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9, 2025 • 21

upvoted a paper 9 months ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33

upvoted a paper 10 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207

upvoted 3 articles 10 months ago

Article

A Dive into Vision-Language Models

Feb 3, 2023

•

Article

Vision Language Models Explained

Apr 11, 2024

•

529

Article

Vision Language Models (Better, faster, stronger)

May 12, 2025

•

605

upvoted a paper 11 months ago

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29, 2025 • 45

upvoted 3 papers about 1 year ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3, 2025 • 68

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25, 2025 • 51

upvoted 2 papers over 1 year ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 160

upvoted a paper almost 2 years ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 55

Xuehui Wang

AI & ML interests

Recent Activity

Organizations

huiserwang's activity

A Dive into Vision-Language Models

Vision Language Models Explained

Vision Language Models (Better, faster, stronger)