File size: 841 Bytes
46b2dc4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
Model Image Captioning Visual Question Answering Image-Text Matching Human Metric - Explanation of Violation Auto Metric - Explanation of Violation identify - Explanation of Violation
Humans 95 92
Ground-truth Caption _ GPT3 (Oracle) 68 62 74
BLIP2 FlanT5-XXL (Fine-tuned) 177 57 84 27 24 73
BLIP2 FlanT5-XL (Fine-tuned) 174 55 81 15 18 60
Predicted Caption _ GPT3 33 42 59
BLIP2 FlanT5-XXL (Zero-shot) 120 55 71 0 0 50
CLIP ViT-L/14 (Zero-shot) 70
OFA Large (Zero-shot) 0 38
CoCa ViT-L-14 MSCOCO (Zero-shot) 102 72
BLIP Large (Zero-shot) 65 39 77
BLIP2 FlanT5-XXL (Text only FT) 2 24 94
|