Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5 • 4
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Paper • 2401.13311 • Published Jan 24 • 10