How can I use this code to perform the "temporal image classification" task described in paper ?

#8
by chenyuming - opened

Hello, I did not find the inference code for the 'temporal image classification' task.
Could you tell me where it is?

Thanks very much!

Microsoft org

Hi, @chenyuming . Have you tried snippet in the README?

Hi, @fepegar . The snippet in the README is performing a 'Temporal Sentence Similarity' analysis, which is a different task discussed in the paper. I have the following three questions, and I would be very grateful if you could answer them.

  1. What I am confused about is the "zero-shot temporal image classification" task in the paper. According to the paper, this task was performed after "Fine-tuning BioViL-T for report generation." Is the currently open-source model the one that was performed after "fine-tune"?

  2. Is the biovil_t_image_model_proj_size_128 a single-layer linear head on the image encoder, or a multi-layer classification head attached to the BioViL-T image encoder?

  3. Is there any evaluation code for Section F.4 “Auto-regressive prompting for zero-shot temporal image classification” on GitHub?

Thank you for your contribution to the community through your reply.

Sign up or log in to comment