About MMEB _v2 evaluation

by lux0933 - opened Aug 3

Aug 3

Ops-MM-embedding achieved impressive results on MMEB. Could you share the evaluation methods and specific instructions you used when testing on MMEB v2?

kiyoonj

Aug 8

looking forward to it, especially intructions for video retrieval!

bESTTiger2025

4 days ago

Any update? I also want to know the instructions when evaluating on MMEB v2.

frozenc

OpenSearch-AI org 4 days ago

In both training and eval I simply split the instruction and the text in mmeb-train/mmeb-eval and feed them to the model separately—no extra tricks. For video retrieval, I treat the video frames as multiple images and run evaluation with batch size = 1.

bESTTiger2025

4 days ago

Why split the instruction and text? For an image-qa example in mmed-eval:
{
'qry_text': 'What sport can you use this for?',
'qry_inst': '<|image_1|>\nRepresent the given image with the following question: ',
}
The text prompt should combine qry_inst and qry_text (maybe <|image_1|> need to be processed)or just use the qry_text?

frozenc

OpenSearch-AI org 4 days ago

In TIGER-Lab/MMEB-eval, qry_inst and qry_text were merged into the single qry_text field. I separated the instruction and text so that the instruction can be placed in the corresponding part of the qwenvl chat template.

bESTTiger2025

4 days ago

So the qry_inst is actually not used in the MMEB-eval?

frozenc

OpenSearch-AI org 4 days ago

In MMEB-eval, the original qry_text is "<|image_1|>\nRepresent the given image with the following question: What sport can you use this for?".

After splitting it into an instruction and a text component, we have

instruction: "Represent the given image with the following question:"
text: "<|image_1|>What sport can you use this for?"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment