19 51 20

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

upvoted a paper 3 days ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

upvoted a paper 3 days ago

STEP3-VL-10B Technical Report

upvoted a paper 6 days ago

BabyVision: Visual Reasoning Beyond Language

View all activity

Organizations

upvoted 2 papers 3 days ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published 4 days ago • 24

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published 5 days ago • 169

upvoted 2 papers 6 days ago

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published 9 days ago • 184

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 8 days ago • 202

upvoted 2 papers about 1 month ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published Dec 18, 2025 • 13

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Paper • 2512.15745 • Published Dec 10, 2025 • 78

liked a model about 1 month ago

WeiChow/EditMGT

Image-to-Image • Updated about 1 month ago • 8

upvoted a paper about 1 month ago

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published Dec 15, 2025 • 4

authored 12 papers about 1 month ago

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

Paper • 2506.24102 • Published Jun 30, 2025

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

Paper • 2509.04444 • Published Sep 4, 2025

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Paper • 2508.12081 • Published Aug 16, 2025

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Paper • 2510.11712 • Published Oct 13, 2025 • 30

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 36

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30, 2025 • 33

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 51

Towards Open Vocabulary Learning: A Survey

Paper • 2306.15880 • Published Jun 28, 2023

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

Paper • 2502.13071 • Published Feb 18, 2025

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity