Qinghong (Kevin) Lin's picture

Qinghong (Kevin) Lin

KevinQHLin

·

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Human-AI Interaction

Recent Activity

upvoted a paper 17 days ago

Show-o2: Improved Native Unified Multimodal Models

new activity 25 days ago

VideoGUI/VideoGUI-High-Plan:Update README.md

new activity 25 days ago

VideoGUI/VideoGUI-Mid-Plan:Update README.md

View all activity

Organizations

authored 2 papers about 1 month ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 105

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11

authored 2 papers 4 months ago

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17 • 16

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Paper • 2503.09402 • Published Mar 12 • 8

authored 2 papers 7 months ago

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 88

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 88

authored 2 papers about 1 year ago

VideoLLM-online: Online Video Large Language Model for Streaming Video

Paper • 2406.11816 • Published Jun 17, 2024 • 25

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9

authored 4 papers over 1 year ago

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17

Too Large; Data Reduction for Vision-Language Pre-Training

Paper • 2305.20087 • Published May 31, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Paper • 2307.05463 • Published Jul 11, 2023 • 11

VisorGPT: Learning Visual Prior via Generative Pre-Training

Paper • 2305.13777 • Published May 23, 2023

authored a paper almost 2 years ago

UniVTG: Towards Unified Video-Language Temporal Grounding

Paper • 2307.16715 • Published Jul 31, 2023 • 11

authored a paper about 2 years ago

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Paper • 2306.08640 • Published Jun 14, 2023 • 26