arxiv:2502.10273

Probing Perceptual Constancy in Large Vision Language Models

Published on Feb 14

Authors:

Haoran Sun ,

Abstract

Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for recognizing visual information in a dynamic world, making it essential for Vision-Language Models (VLMs). However, whether VLMs are currently and theoretically capable of mastering this ability remains underexplored. In this study, we evaluated 33 VLMs using 253 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions, to evaluate the models' recognition of object properties under varying conditions. We found significant variability in VLM performance, with models performance in shape constancy clearly dissociated from that of color and size constancy.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.10273 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.10273 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.