timm
/

Zero-Shot Image Classification
OpenCLIP
Safetensors
siglip
siglip2
vision

Model size seems odd

#1
by bbb42 - opened

Google's hf hub and paper contain only 384 size for patch 14. Could this be an error?

PyTorch Image Models org
β€’
edited about 6 hours ago

@bbb42 384 isn't divisible by 14, it was an error made in the original siglip and they made an equivalent version for 2 (I assume for comparability's sake). It evals pretty much the same in 378 and 384, but with image processing setup for 378 it doesn't discard data in the bottom and right pixels of the image. I didn't bother making a 384x384 version of this for timm or OpenCLIP, for SigLIP 1 originally had the 384 and then added the 378 to fix after I noticed the issue. I couldn't remove the 384 version as people were already using it...

PyTorch Image Models org

@rwightman oh, wow, thanks! Now that's make perfect sense :)

Sign up or log in to comment