How to only use the text and visual embedding?

by Labmem009 - opened Feb 25

Feb 25

Interesting work! I want to use the alignment between images and text in the encoder of this model for downstream tasks. How should I use it?

Feb 28

+1， is it possible to use only visual encoder to do downstream task? like classification

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment