How to perform retrieval using fused [image, text] as the input query?

#65

by ququwowo - opened 6 days ago

6 days ago

Hi All!

Could you please advise: what would be the best option for user query which is a combination of [text, image]? Generally, how can I generate this "fused" embedding for [text, image] which works best with jina-v4?

For example, the user wanted to retrieve document using this query ["can you identify the mechanical tool type in this image and how should I operate this tool?" + img_of_tool]. In this case, both the image and text are important, I want jina-v4 to return documents discussing both the tool and the operation procedure.

Thank you!

jupyterjazz

Jina AI org 5 days ago

Hi @ququwowo ,

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment