Can Blip2ForImageTextRetrieval be trained with Trainer?

#5
by wang-sy - opened

In the definition of Blip2ImageTextMatchingModelOutput, loss was defined as

Args:
        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `return_loss` is `True`):
            Contrastive loss for image-text similarity.

However the calculation of loss was not done in the forward loop of Blip2ForImageTextRetrieval, am I missing out on this calculation, where is loss calculated?

Is the training of Blip2ForImageTextRetrievalsupported by the Trainer?

Thank you for the great work!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment