XING Sun's picture

1 2

XING Sun

tedsun

·

AI & ML interests

None yet

Recent Activity

authored a paper 7 days ago

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

View all activity

Organizations

None yet

tedsun's activity

authored a paper 7 days ago

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published 9 days ago • 34

liked a model 4 months ago

VITA-MLLM/VITA

Updated Sep 5, 2024 • 19

authored 2 papers 5 months ago

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 47

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

Paper • 2408.02085 • Published Aug 4, 2024 • 17

upvoted a paper 7 months ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 21

liked a model 11 months ago

TownsWu/PEG

Sentence Similarity • Updated May 23, 2024 • 154 • 29

authored 10 papers about 1 year ago

A Survey on Multimodal Large Language Models

Paper • 2306.13549 • Published Jun 23, 2023 • 1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Paper • 2306.13394 • Published Jun 23, 2023

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

Paper • 2308.08239 • Published Aug 16, 2023 • 1

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Paper • 2308.04197 • Published Aug 8, 2023

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval

Paper • 2308.04008 • Published Aug 8, 2023

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Paper • 2009.05757 • Published Sep 12, 2020

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 15

Towards Robust Text Retrieval with Progressive Learning

Paper • 2311.11691 • Published Nov 20, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Paper • 2312.12436 • Published Dec 19, 2023 • 13