timm/ViT-SO400M-14-SigLIP2-378 · Model size seems odd

bbb42

about 7 hours ago

Google's hf hub and paper contain only 384 size for patch 14. Could this be an error?

rwightman

PyTorch Image Models org about 6 hours ago

•

edited about 6 hours ago

@bbb42 384 isn't divisible by 14, it was an error made in the original siglip and they made an equivalent version for 2 (I assume for comparability's sake). It evals pretty much the same in 378 and 384, but with image processing setup for 378 it doesn't discard data in the bottom and right pixels of the image. I didn't bother making a 384x384 version of this for timm or OpenCLIP, for SigLIP 1 originally had the 384 and then added the 378 to fix after I noticed the issue. I couldn't remove the 384 version as people were already using it...

rwightman

PyTorch Image Models org about 6 hours ago

For reference, please see https://huggingface.co/google/siglip-so400m-patch14-384/discussions/4

bbb42

about 6 hours ago

@rwightman oh, wow, thanks! Now that's make perfect sense :)