Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 43
Towards A Unified Agent with Foundation Models Paper • 2307.09668 • Published Jul 18, 2023 • 12
Learning to Retrieve In-Context Examples for Large Language Models Paper • 2307.07164 • Published Jul 14, 2023 • 21
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Paper • 2306.17842 • Published Jun 30, 2023 • 9
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Paper • 2306.16410 • Published Jun 28, 2023 • 28