[Error] Error when executing the example code
Hi,
If I run the model with example code within the folder, I get this error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-4-fd9293295145> in <cell line: 2>()
1 import torch
----> 2 from modeling_siglip import SiglipVisionModel
3
4 DEVICE = torch.device("cuda:0")
5 PATCH_SIZE = 14
/content/siglip-so400m-14-980-flash-attn2-navit/modeling_siglip.py in <module>
40 replace_return_docstrings,
41 )
---> 42 from .configuration_siglip import SiglipConfig, SiglipTextConfig, SiglipVisionConfig
43
44
ImportError: attempted relative import with no known parent package
If I replace the source code files in transformers (e.g., modeling_siglip.py) with the source code files in this repo, I get this error:
Actually there is the argument:
If I run the code with:
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-384-flash-attn2", trust_remote_code=True)
model.eval().cuda().half()
pixel_values = torch.randn(1, 3, 384, 384).cuda().half()
output= model.vision_model(pixel_values)
It does work. But the model only accepts images with 384*384 resolution. If I send an image with 512*512 resolution, I will get a dimension mismatch error from the position embedding.
Could you please modify the example code so it can be executed? How to run the model successfully with Google Colab?
Hi
@StarCycle
Have you tried a model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit"); model.vision_model
?
Thanks @VictorSanh !
It should be OK with
import torch
from transformers import AutoModel
pixel_values = torch.randn(1, 3, 224, 384).cuda().half() # any resolution here
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit", trust_remote_code=True)
model.eval().cuda().half()
output= model.vision_model(pixel_values)
Is it necessary to specify the patch_attention_mask
in the example of README?