Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts Paper • 2505.17127 • Published 17 days ago • 2
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind Paper • 2502.15969 • Published Feb 21 • 2
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation Paper • 2406.16320 • Published Jun 24, 2024 • 3