Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.20.0
(The following contents are from the ViLT repo.)
Dataset Preparation
We utilize seven datsets: Google Conceptual Captions (GCC), Stony Brook University Captions (SBU), Visual Genome (VG), COCO Captions (COCO), Flickr 30K Captions (F30K), Visual Question Answering v2 (VQAv2), and Natural Language for Visual Reasoning 2 (NLVR2).
We do not distribute datasets because of the license issue.
Please download the datasets by yourself.
We use pyarrow
to serialize the datasets, conversion scripts are located in vilt/utils/write_*.py
.
Please organize the datasets as follows and run make_arrow
functions to convert the dataset to pyarrow binary file.
GCC
https://ai.google.com/research/ConceptualCaptions/download
GCC provides tuples of image url and caption, note that a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.
root
βββ images_train
β βββ 0000 # First four letters of image name
β β βββ 0000000 # Image Binary
β β βββ 0000001
β β βββ ...
β βββ 0001
β β βββ 0001000
β β βββ 0001001
β β βββ ...
β βββ ...
βββ images_val
β βββ 0000
β β βββ ...
β βββ ...
βββ train_annot.json # List of (image_file_path, caption) tuple
βββ val_annot.json # List of (image_file_path, caption) tuple
from vlmo.utils.write_conceptual_caption import make_arrow
make_arrow(root, arrows_root)
SBU
http://www.cs.virginia.edu/~vicente/sbucaptions/
Similar to GCC, SBU also provides tuples of image url and caption, and also a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.
root
βββ images_train
β βββ 0000 # First four letters of image name
β β βββ 0000000 # Image Binary
β β βββ 0000001
β β βββ ...
β βββ 0001
β β βββ 0001000
β β βββ 0001001
β β βββ ...
β βββ ...
βββ annot.json # List of (image_file_path, caption) tuple
from vlmo.utils.write_sbu import make_arrow
make_arrow(root, arrows_root)
VG
http://visualgenome.org/api/v0/api_home.html
Download image part1, image part2 and region descriptions
root
βββ images
β βββ VG_100K
β β βββ 10.jpg
β β βββ 107899.jpg
β β βββ ...
β βββ VG_100K_2
β β βββ 1.jpg
β β βββ 100.jpg
β β βββ ...
β βββ ...
βββ annotations
βββ region_descriptions.json
from vlmo.utils.write_vg import make_arrow
make_arrow(root, arrows_root)
COCO
https://cocodataset.org/#download
Download 2014 train images, 2014 val images and karpathy split
root
βββ train2014
β βββ COCO_train2014_000000000009.jpg
| βββ ...
βββ val2014
| βββ COCO_val2014_000000000042.jpg
| βββ ...
βββ karpathy
βββ dataset_coco.json
from vlmo.utils.write_coco_karpathy import make_arrow
make_arrow(root, arrows_root)
F30K
http://bryanplummer.com/Flickr30kEntities/
Sign flickr images request form and download karpathy split
root
βββ flickr30k-images
β βββ 1000092795.jpg
| βββ ...
βββ karpathy
βββ dataset_flickr30k.json
from vlmo.utils.write_f30k_karpathy import make_arrow
make_arrow(root, arrows_root)
VQAv2
https://visualqa.org/download.html
Download COCO 2014 train images, 2014 val images, 2015 test images, annotations (train, val), and questions (train, val, test)
root
βββ train2014
β βββ COCO_train2014_000000000009.jpg
| βββ ...
βββ val2014
| βββ COCO_val2014_000000000042.jpg
| βββ ...
βββ test2015
| βββ COCO_test2015_000000000001.jpg
| βββ ...
βββ v2_OpenEnded_mscoco_train2014_questions.json
βββ v2_OpenEnded_mscoco_val2014_questions.json
βββ v2_OpenEnded_mscoco_test2015_questions.json
βββ v2_OpenEnded_mscoco_test-dev2015_questions.json
βββ v2_mscoco_train2014_annotations.json
βββ v2_mscoco_val2014_annotations.json
from vlmo.utils.write_vqa import make_arrow
make_arrow(root, arrows_root)
NLVR2
Clone the repository and sign the request form to download the images.
root
βββ images/train
β βββ 0
β β βββ train-10108-0-img0.png
β β βββ ...
β βββ 1
β β βββ train-10056-0-img0.png
β β βββ ...
β βββ ...
βββ dev
β βββ dev-0-0-img0.png
| βββ ...
βββ test1
β βββ test1-0-0-img0.png
| βββ ...
βββ nlvr
βββ nlvr2
βββ README.md
from vlmo.utils.write_nlvr2 import make_arrow
make_arrow(root, arrows_root)
WikiBK (Text only data)
from vlmo.utils.write_wikibk import make_arrow
make_arrow(root, arrows_root)