File size: 841 Bytes
46b2dc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
								
Model	Image Captioning	Visual Question Answering	Image-Text Matching	Human Metric - Explanation of Violation	Auto Metric - Explanation of Violation	identify - Explanation of Violation		
Humans				95		92		
Ground-truth Caption _ GPT3 (Oracle)				68	62	74		
BLIP2 FlanT5-XXL (Fine-tuned)	177	57	84	27	24	73		
BLIP2 FlanT5-XL (Fine-tuned)	174	55	81	15	18	60		
Predicted Caption _ GPT3				33	42	59		
BLIP2 FlanT5-XXL (Zero-shot)	120	55	71	0	0	50		
CLIP ViT-L/14 (Zero-shot)			70					
OFA Large (Zero-shot)	0	38						
CoCa ViT-L-14 MSCOCO (Zero-shot)	102		72					
BLIP Large (Zero-shot)	65	39	77					
BLIP2 FlanT5-XXL (Text only FT)	2	24	94