Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Paper • 2306.09996 • Published Jun 16, 2023
Benchmarking Vision Language Models for Cultural Understanding Paper • 2407.10920 • Published Jul 15, 2024
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding Paper • 2306.08832 • Published Jun 15, 2023
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper • 2505.20793 • Published May 27 • 11
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14