Tzktz's picture
Upload 7664 files
6fc683c verified

A newer version of the Gradio SDK is available: 5.20.0

Upgrade

DiT for Image Classification

This folder contains the image classification running instructions on DiT for RVL-CDIP.

Usage

Data Preparation

RVL-CDIP

Download the "rvl-cdip.tar.gz" from this link (~37GB). Then extract it to PATH-to-rvlcdip.

Evaluation

Following commands provide example to evaluate the fine-tuned checkpoints.

python -m torch.distributed.launch --nproc_per_node=8 --master_port=47770  run_class_finetuning.py \
        --model beit_base_patch16_224          #beit_base_patch16_224 / beit_large_patch16_224
        --data_path "/path/to/rvlcdip"
        --eval_data_path "/path/to/rvlcdip"
        --enable_deepspeed
        --nb_classes 16
        --eval
        --data_set rvlcdip
        --finetune /path/to/model.pth
        --output_dir output_dir
        --log_dir output_dir/tf
        --batch_size 256
        --abs_pos_emb
        --disable_rel_pos_bias

Training

Fine-tune DiT on RVL-CDIP:

exp_name=dit-base-exp

mkdir -p output/${exp_name}
python -m torch.distributed.launch --nproc_per_node=8 run_class_finetuning.py
        --model beit_base_patch16_224           #beit_base_patch16_224 / beit_large_patch16_224
        --data_path "/path/to/rvlcdip"
        --eval_data_path "/path/to/rvlcdip"
        --nb_classes 16
        --data_set rvlcdip
        --finetune /path/to/model.pth
        --output_dir output/${exp_name}/ 
        --log_dir output/${exp_name}/tf 
        --batch_size 64 
        --lr 5e-4 
        --update_freq 2 
        --eval_freq 10 
        --save_ckpt_freq 10 
        --warmup_epochs 20 
        --epochs 180 
        --layer_scale_init_value 1e-5 
        --layer_decay 0.75 
        --drop_path 0.2 
        --weight_decay 0.05 
        --clip_grad 1.0 
        --abs_pos_emb 
        --disable_rel_pos_bias 

Citation

If you find this repository useful, please consider citing our work:

@misc{li2022dit,
    title={DiT: Self-supervised Pre-training for Document Image Transformer},
    author={Junlong Li and Yiheng Xu and Tengchao Lv and Lei Cui and Cha Zhang and Furu Wei},
    year={2022},
    eprint={2203.02378},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgment

This part is built using the timm library, the Beit repository, the DeiT repository and the Dino repository.