SigLIP or SigLIP2 encoder?
SigLIP or SigLIP2 encoder?
Hi
@GopiUppari
,
I am familiar with SigLIP.
However, in the Gemma3 paper, it was not stated whether SigLIP or SigLIP2 was utilized. From the config, it is impossible to tests either because the arch is the same so both are defined as siglip_vision_model.
Did Gemma3 utilize the SigLIP2 or SigLIP checkpoints?
Best,
Orr
I'm also curious if the siglip_vision_model's embeddings remain general purpose (i.e frozen during gemma training) or the SigLIP has been finetuned to improve Gemma's performance
According to the Gemma3 paper, they used SigLIP instead of SigLIP 2, and they froze its weights during the training process for "simplicity". But it's not stated whether the weight they used is the same as the public version of the SigLIP model.
https://arxiv.org/pdf/2503.19786
"We use a vision encoder based on SigLIP (Zhai et al., 2023)." could be SigLIP2, SigLIP, or even encoders from Paligemma/similar...