Marqo/marqo-fashionSigLIP · Text Embeddings

Dec 11, 2024

Is it possible to use the text encoder to extract the features and use as a Text Emdedding model, or the model is only performant with image integrated?

Jesse-marqo

Marqo org Dec 12, 2024

hi @omarabb315 , it can be used for text only. the model was trained with text-text understanding alongside text-image

omarabb315

Dec 16, 2024

thank you but which would be better for text-text tasks, this model or Marqo/marqo-gcl-e5-large-v2-130?

Jesse-marqo

Marqo org Dec 17, 2024

it will depend on the data you have. i would suggest testing both

omarabb315

Dec 31, 2024

hi @omarabb315 , it can be used for text only. the model was trained with text-text understanding alongside text-image

Can you please help me how to export the text feature extraction model into SentenceTransformers class?

Jesse-marqo

Marqo org Jan 29

@omarabb315 does using transformers work?

from transformers import AutoModel, AutoProcessor
model = AutoModel.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)

import torch
from PIL import Image

image = [Image.open("docs/fashion-hippo.png")]
text = ["a hat", "a t-shirt", "shoes"]
processed = processor(text=text, images=image, padding='max_length', return_tensors="pt")

with torch.no_grad():
    image_features = model.get_image_features(processed['pixel_values'], normalize=True)
    text_features = model.get_text_features(processed['input_ids'], normalize=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
# [0.98379946, 0.01294010, 0.00326044]

omarabb315

Jan 30

thank you so much for the reply
Actually I need to use the class of SenenceTransformer ...