Can Blip2ForImageTextRetrieval be trained with Trainer?

by wang-sy - opened Nov 16, 2024

Nov 16, 2024

In the definition of Blip2ImageTextMatchingModelOutput, loss was defined as

Args:
        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `return_loss` is `True`):
            Contrastive loss for image-text similarity.

However the calculation of loss was not done in the forward loop of Blip2ForImageTextRetrieval, am I missing out on this calculation, where is loss calculated?

Is the training of Blip2ForImageTextRetrievalsupported by the Trainer?

Thank you for the great work!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment