Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences Paper • 2401.10529 • Published Jan 19 • 1
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions Paper • 2311.12793 • Published Nov 21, 2023 • 18
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26
Datasets: A Community Library for Natural Language Processing Paper • 2109.02846 • Published Sep 7, 2021 • 8