Model Card for mae_hiera_hiera_abswin_base
This is a Masked Autoencoder (MAE) model based on the Hiera (Hierarchical Vision Transformer) architecture.
Important: This model is specifically designed for continued masked image modeling (MIM) pre-training only. If you're looking to fine-tune on downstream tasks, please use the pre-trained encoder directly at https://huggingface.co/birder-project/hiera_abswin_base_mim
Model Details
Model Type: Masked image modeling
Model Stats:
- Params (M): 85.9
- Input image size: 224 x 224
Dataset: Trained on a diverse dataset of approximately 12M images, including:
- iNaturalist 2021 (~2.6M)
- WebVision-2.0 (~1.5M random subset)
- imagenet-w21-webp-wds (~1M random subset)
- SA-1B (~220K random subset of 20 chunks)
- COCO (~120K)
- NABirds (~48K)
- GLDv2 (~40K random subset of 6 chunks)
- Birdsnap v1.1 (~44K)
- CUB-200 2011 (~11K)
- The Birder dataset (~6M, private dataset)
Papers:
- Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles: https://arxiv.org/abs/2306.00989
- Window Attention is Bugged: How not to Interpolate Position Embeddings: https://arxiv.org/abs/2311.05613
Model Usage
The Birder CLI provides the easiest way to continue MIM training with this model. However, if you need to integrate the model into your own training scripts, you can use the Python API to load the checkpoint as shown below.
Continue Training with Birder CLI
First download the model weights:
python -m birder.tools download-model mae_hiera_hiera_abswin_base
Then run training:
python -m birder.scripts.train_mim --network mae_hiera --encoder hiera_abswin_base --pretrained --opt adamw --lr 0.0008 --opt-betas 0.9 0.95 --lr-scheduler cosine --warmup-epochs 40 --epochs 400 --batch-size 512 --wd 0.05 --encoder-model-config drop_path_rate=0.2 --amp --compile --compile-opt --find-unused-parameters --data-path data/training
Python API
import torch
from birder.common import fs_ops
device = torch.device("cuda")
(net, training_states) = fs_ops.load_mim_checkpoint(
device,
"mae_hiera",
encoder="hiera_abswin_base",
epoch=None,
)
The Python code snippet above loads the model architecture but does not download weights - use the Birder CLI download command to get the actual trained parameters
Citation
@misc{ryali2023hierahierarchicalvisiontransformer,
title={Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles},
author={Chaitanya Ryali and Yuan-Ting Hu and Daniel Bolya and Chen Wei and Haoqi Fan and Po-Yao Huang and Vaibhav Aggarwal and Arkabandhu Chowdhury and Omid Poursaeed and Judy Hoffman and Jitendra Malik and Yanghao Li and Christoph Feichtenhofer},
year={2023},
eprint={2306.00989},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2306.00989},
}
@misc{bolya2023windowattentionbuggedinterpolate,
title={Window Attention is Bugged: How not to Interpolate Position Embeddings},
author={Daniel Bolya and Chaitanya Ryali and Judy Hoffman and Christoph Feichtenhofer},
year={2023},
eprint={2311.05613},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2311.05613},
}
- Downloads last month
- 47