How can I use this code to perform the "temporal image classification" task described in paper ?
Hello, I did not find the inference code for the 'temporal image classification' task.
Could you tell me where it is?
Thanks very much!
Hi, @fepegar . The snippet in the README is performing a 'Temporal Sentence Similarity' analysis, which is a different task discussed in the paper. I have the following three questions, and I would be very grateful if you could answer them.
What I am confused about is the "zero-shot temporal image classification" task in the paper. According to the paper, this task was performed after "Fine-tuning BioViL-T for report generation." Is the currently open-source model the one that was performed after "fine-tune"?
Is the biovil_t_image_model_proj_size_128 a single-layer linear head on the image encoder, or a multi-layer classification head attached to the BioViL-T image encoder?
Is there any evaluation code for Section F.4 “Auto-regressive prompting for zero-shot temporal image classification” on GitHub?
Thank you for your contribution to the community through your reply.