Tony Zhao

tianchez

8 28 30

https://www.tianchez.com

AI & ML interests

Multimodal & Generative AI

Recent Activity

authored a paper 20 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

authored a paper 20 days ago

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

authored a paper 20 days ago

ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

View all activity

Organizations

Posts 1

Post

4690

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1