Ke Li's picture

Ke Li

tristanli

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

authored a paper 3 days ago

Masked Autoencoders are Efficient Class Incremental Learners

authored a paper 3 days ago

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

View all activity

Organizations

None yet

tristanli's activity

authored 18 papers 3 days ago

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Paper • 2312.12436 • Published Dec 19, 2023 • 13

Masked Autoencoders are Efficient Class Incremental Learners

Paper • 2308.12510 • Published Aug 24, 2023

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Paper • 2009.05757 • Published Sep 12, 2020

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 15

Towards Robust Text Retrieval with Progressive Learning

Paper • 2311.11691 • Published Nov 20, 2023

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

Paper • 2312.06363 • Published Dec 11, 2023 • 1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Paper • 2306.13394 • Published Jun 23, 2023

A Survey on Multimodal Large Language Models

Paper • 2306.13549 • Published Jun 23, 2023 • 1

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 21

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

Paper • 2310.09761 • Published Oct 15, 2023

Sinkhorn Distance Minimization for Knowledge Distillation

Paper • 2402.17110 • Published Feb 27, 2024

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Paper • 2403.06136 • Published Mar 10, 2024

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

Paper • 2408.02085 • Published Aug 4, 2024 • 17

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Paper • 2408.15915 • Published Aug 28, 2024 • 19

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Paper • 2408.12325 • Published Aug 22, 2024

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Paper • 2412.04317 • Published Dec 5, 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Paper • 2411.00774 • Published Nov 1, 2024

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published 9 days ago • 34

updated a Space almost 2 years ago

No application file

Lm