20 61 24

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

authored a paper 6 days ago

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

authored a paper 6 days ago

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

authored a paper 6 days ago

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

View all activity

Organizations

commented a paper about 1 year ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83 •

New activity in General-Level/General-Bench-Openset about 1 year ago

Delete video/comrehension

#9 opened about 1 year ago by

HarborYuan

New activity in General-Level/General-Bench-Closeset about 1 year ago

Delete closeset

#4 opened about 1 year ago by

QingyuShi

commented 4 papers about 1 year ago

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14, 2025 • 16 •

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14, 2025 • 16 •

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14, 2025 • 27 •

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14, 2025 • 27 •

New activity in ByteDance/Sa2VA-4B over 1 year ago

Dependency conflicts

#4 opened over 1 year ago by

tbomez

commented 5 papers over 1 year ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7, 2025 • 49 •

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48 •

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48 •

EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published Dec 9, 2024 • 13 •

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 52 •

commented 5 papers almost 2 years ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54 •

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54 •

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Paper • 2406.20085 • Published Jun 28, 2024 • 13 •

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54 •

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54 •

New activity in Dense-World/OMG-LLaVA almost 2 years ago

Upload omg_llava_7b_xxl_pretrain_1024image_8gpus.pth

#1 opened almost 2 years ago by

LXT

New activity in LXT/OMG_Seg over 2 years ago

Apply for community grant: Academic project (gpu)

#2 opened over 2 years ago by

LXT

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity

Delete video/comrehension

Delete closeset

Dependency conflicts

Upload omg_llava_7b_xxl_pretrain_1024image_8gpus.pth

Apply for community grant: Academic project (gpu)