Tzktz's picture
Upload 7664 files
6fc683c verified

A newer version of the Gradio SDK is available: 5.20.0

Upgrade

(The following contents are from the ViLT repo.)

Dataset Preparation

We utilize seven datsets: Google Conceptual Captions (GCC), Stony Brook University Captions (SBU), Visual Genome (VG), COCO Captions (COCO), Flickr 30K Captions (F30K), Visual Question Answering v2 (VQAv2), and Natural Language for Visual Reasoning 2 (NLVR2).

We do not distribute datasets because of the license issue. Please download the datasets by yourself. We use pyarrow to serialize the datasets, conversion scripts are located in vilt/utils/write_*.py. Please organize the datasets as follows and run make_arrow functions to convert the dataset to pyarrow binary file.

GCC

https://ai.google.com/research/ConceptualCaptions/download

GCC provides tuples of image url and caption, note that a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.

root
β”œβ”€β”€ images_train            
β”‚   β”œβ”€β”€ 0000                # First four letters of image name
β”‚   β”‚   β”œβ”€β”€ 0000000         # Image Binary
β”‚   β”‚   β”œβ”€β”€ 0000001      
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ 0001              
β”‚   β”‚   β”œβ”€β”€ 0001000      
β”‚   β”‚   β”œβ”€β”€ 0001001      
β”‚   β”‚   └── ...          
β”‚   └── ...          
β”œβ”€β”€ images_val          
β”‚   β”œβ”€β”€ 0000              
β”‚   β”‚   └── ...
β”‚   └── ...          
β”œβ”€β”€ train_annot.json        # List of (image_file_path, caption) tuple
└── val_annot.json          # List of (image_file_path, caption) tuple
from vlmo.utils.write_conceptual_caption import make_arrow
make_arrow(root, arrows_root)

SBU

http://www.cs.virginia.edu/~vicente/sbucaptions/

Similar to GCC, SBU also provides tuples of image url and caption, and also a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.

root
β”œβ”€β”€ images_train            
β”‚   β”œβ”€β”€ 0000                # First four letters of image name
β”‚   β”‚   β”œβ”€β”€ 0000000         # Image Binary
β”‚   β”‚   β”œβ”€β”€ 0000001      
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ 0001              
β”‚   β”‚   β”œβ”€β”€ 0001000      
β”‚   β”‚   β”œβ”€β”€ 0001001      
β”‚   β”‚   └── ...          
β”‚   └── ...          
└── annot.json              # List of (image_file_path, caption) tuple
from vlmo.utils.write_sbu import make_arrow
make_arrow(root, arrows_root)

VG

http://visualgenome.org/api/v0/api_home.html

Download image part1, image part2 and region descriptions

root
β”œβ”€β”€ images            
β”‚   β”œβ”€β”€ VG_100K                  
β”‚   β”‚   β”œβ”€β”€ 10.jpg        
β”‚   β”‚   β”œβ”€β”€ 107899.jpg      
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ VG_100K_2              
β”‚   β”‚   β”œβ”€β”€ 1.jpg      
β”‚   β”‚   β”œβ”€β”€ 100.jpg      
β”‚   β”‚   └── ...          
β”‚   └── ...          
└── annotations         
    └── region_descriptions.json
from vlmo.utils.write_vg import make_arrow
make_arrow(root, arrows_root)

COCO

https://cocodataset.org/#download

Download 2014 train images, 2014 val images and karpathy split

root
β”œβ”€β”€ train2014            
β”‚   β”œβ”€β”€ COCO_train2014_000000000009.jpg                
|   └── ...
β”œβ”€β”€ val2014              
|   β”œβ”€β”€ COCO_val2014_000000000042.jpg
|   └── ...          
└── karpathy
    └── dataset_coco.json
from vlmo.utils.write_coco_karpathy import make_arrow
make_arrow(root, arrows_root)

F30K

http://bryanplummer.com/Flickr30kEntities/

Sign flickr images request form and download karpathy split

root
β”œβ”€β”€ flickr30k-images            
β”‚   β”œβ”€β”€ 1000092795.jpg
|   └── ...
└── karpathy
    └── dataset_flickr30k.json
from vlmo.utils.write_f30k_karpathy import make_arrow
make_arrow(root, arrows_root)

VQAv2

https://visualqa.org/download.html

Download COCO 2014 train images, 2014 val images, 2015 test images, annotations (train, val), and questions (train, val, test)

root
β”œβ”€β”€ train2014            
β”‚   β”œβ”€β”€ COCO_train2014_000000000009.jpg                
|   └── ...
β”œβ”€β”€ val2014              
|   β”œβ”€β”€ COCO_val2014_000000000042.jpg
|   └── ...  
β”œβ”€β”€ test2015              
|   β”œβ”€β”€ COCO_test2015_000000000001.jpg
|   └── ...         
β”œβ”€β”€ v2_OpenEnded_mscoco_train2014_questions.json
β”œβ”€β”€ v2_OpenEnded_mscoco_val2014_questions.json
β”œβ”€β”€ v2_OpenEnded_mscoco_test2015_questions.json
β”œβ”€β”€ v2_OpenEnded_mscoco_test-dev2015_questions.json
β”œβ”€β”€ v2_mscoco_train2014_annotations.json
└── v2_mscoco_val2014_annotations.json
from vlmo.utils.write_vqa import make_arrow
make_arrow(root, arrows_root)

NLVR2

Clone the repository and sign the request form to download the images.

root
β”œβ”€β”€ images/train           
β”‚   β”œβ”€β”€ 0                  
β”‚   β”‚   β”œβ”€β”€ train-10108-0-img0.png   
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ 1                  
β”‚   β”‚   β”œβ”€β”€ train-10056-0-img0.png       
β”‚   β”‚   └── ...
β”‚   └── ...
β”œβ”€β”€ dev       
β”‚   β”œβ”€β”€ dev-0-0-img0.png
|   └── ...
β”œβ”€β”€ test1     
β”‚   β”œβ”€β”€ test1-0-0-img0.png
|   └── ...
β”œβ”€β”€ nlvr
β”œβ”€β”€ nlvr2
└── README.md
from vlmo.utils.write_nlvr2 import make_arrow
make_arrow(root, arrows_root)

WikiBK (Text only data)

from vlmo.utils.write_wikibk import make_arrow
make_arrow(root, arrows_root)