SigLIP2 or SigLIP1
License
BAGEL is licensed under the Apache 2.0 license. It is finetuned from Qwen2.5-7B-Instruct and siglip-so400m-14-980-flash-attn2-navit model, and uses the FLUX.1-schnell VAE model, all under Apache 2.0.
siglip-so400m-14-980-flash-attn2-navit by HuggingFaceM4 is SigLIP1, but in your paper
We adopt
SigLIP2-so400m/14 [74] with a fixed 384-resolution as the initialization of the ViT encoder. Building
Thanks for the pointing out this issue! We use siglip-so400m-14-384-flash-attn2. The information in license is updated.
I'm still confused.
'siglip-so400m-14-384-flash-attn2' seems to be SigLIP1. But SigLIP2-so400m/14 was mentioned in your paper.
I'm still confused.
'siglip-so400m-14-384-flash-attn2' seems to be SigLIP1. But SigLIP2-so400m/14 was mentioned in your paper.
let me clarify this. we use SigLIP2-so400m/14 with a 384x384 input resolution. then we interpolate the position embeddings to 980x980.
this is actually what siglip-so400m-14-384-flash-attn2 has done to siglip-so400m-14-384.