timm documentation

Changelog

You are viewing v1.0.9 version. A newer version v1.0.11 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Changelog

Aug 8, 2024

July 28, 2024

  • Add mobilenet_edgetpu_v2_m weights w/ ra4 mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.
  • Release 1.0.8

July 26, 2024

  • More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.99 15.01 97.294 2.706 32.59 544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.772 15.228 97.344 2.656 32.59 480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.64 15.36 97.114 2.886 32.59 448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.314 15.686 97.102 2.898 32.59 384
mobilenetv4_conv_aa_large.e600_r384_in1k 83.824 16.176 96.734 3.266 32.59 480
mobilenetv4_conv_aa_large.e600_r384_in1k 83.244 16.756 96.392 3.608 32.59 384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.99 17.01 96.67 3.33 11.07 320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.364 17.636 96.256 3.744 11.07 256
model top1 top1_err top5 top5_err param_count img_size
efficientnet_b0.ra4_e3600_r224_in1k 79.364 20.636 94.754 5.246 5.29 256
efficientnet_b0.ra4_e3600_r224_in1k 78.584 21.416 94.338 5.662 5.29 224
mobilenetv1_100h.ra4_e3600_r224_in1k 76.596 23.404 93.272 6.728 5.28 256
mobilenetv1_100.ra4_e3600_r224_in1k 76.094 23.906 93.004 6.996 4.23 256
mobilenetv1_100h.ra4_e3600_r224_in1k 75.662 24.338 92.504 7.496 5.28 224
mobilenetv1_100.ra4_e3600_r224_in1k 75.382 24.618 92.312 7.688 4.23 224
  • Prototype of set_input_size() added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.
  • Improved support in swin for different size handling, in addition to set_input_size, always_partition and strict_img_size args have been added to __init__ to allow more flexible input size constraints
  • Fix out of order indices info for intermediate ‘Getter’ feature wrapper, check out or range indices for same.
  • Add several tiny < .5M param models for testing that are actually trained on ImageNet-1k
model top1 top1_err top5 top5_err param_count img_size crop_pct
test_efficientnet.r160_in1k 47.156 52.844 71.726 28.274 0.36 192 1.0
test_byobnet.r160_in1k 46.698 53.302 71.674 28.326 0.46 192 1.0
test_efficientnet.r160_in1k 46.426 53.574 70.928 29.072 0.36 160 0.875
test_byobnet.r160_in1k 45.378 54.622 70.572 29.428 0.46 160 0.875
test_vit.r160_in1k 42.0 58.0 68.664 31.336 0.37 192 1.0
test_vit.r160_in1k 40.822 59.178 67.212 32.788 0.37 160 0.875
  • Fix vit reg token init, thanks Promisery
  • Other misc fixes

June 24, 2024

  • 3 more MobileNetV4 hyrid weights with different MQA weight init scheme
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_hybrid_large.ix_e600_r384_in1k 84.356 15.644 96.892 3.108 37.76 448
mobilenetv4_hybrid_large.ix_e600_r384_in1k 83.990 16.010 96.702 3.298 37.76 384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 83.394 16.606 96.760 3.240 11.07 448
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 82.968 17.032 96.474 3.526 11.07 384
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 82.492 17.508 96.278 3.722 11.07 320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 81.446 18.554 95.704 4.296 11.07 256
  • florence2 weight loading in DaViT model

June 12, 2024

  • MobileNetV4 models and initial set of timm trained weights added:
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_hybrid_large.e600_r384_in1k 84.266 15.734 96.936 3.064 37.76 448
mobilenetv4_hybrid_large.e600_r384_in1k 83.800 16.200 96.770 3.230 37.76 384
mobilenetv4_conv_large.e600_r384_in1k 83.392 16.608 96.622 3.378 32.59 448
mobilenetv4_conv_large.e600_r384_in1k 82.952 17.048 96.266 3.734 32.59 384
mobilenetv4_conv_large.e500_r256_in1k 82.674 17.326 96.31 3.69 32.59 320
mobilenetv4_conv_large.e500_r256_in1k 81.862 18.138 95.69 4.31 32.59 256
mobilenetv4_hybrid_medium.e500_r224_in1k 81.276 18.724 95.742 4.258 11.07 256
mobilenetv4_conv_medium.e500_r256_in1k 80.858 19.142 95.768 4.232 9.72 320
mobilenetv4_hybrid_medium.e500_r224_in1k 80.442 19.558 95.38 4.62 11.07 224
mobilenetv4_conv_blur_medium.e500_r224_in1k 80.142 19.858 95.298 4.702 9.72 256
mobilenetv4_conv_medium.e500_r256_in1k 79.928 20.072 95.184 4.816 9.72 256
mobilenetv4_conv_medium.e500_r224_in1k 79.808 20.192 95.186 4.814 9.72 256
mobilenetv4_conv_blur_medium.e500_r224_in1k 79.438 20.562 94.932 5.068 9.72 224
mobilenetv4_conv_medium.e500_r224_in1k 79.094 20.906 94.77 5.23 9.72 224
mobilenetv4_conv_small.e2400_r224_in1k 74.616 25.384 92.072 7.928 3.77 256
mobilenetv4_conv_small.e1200_r224_in1k 74.292 25.708 92.116 7.884 3.77 256
mobilenetv4_conv_small.e2400_r224_in1k 73.756 26.244 91.422 8.578 3.77 224
mobilenetv4_conv_small.e1200_r224_in1k 73.454 26.546 91.34 8.66 3.77 224
  • Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
  • ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
  • OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.

May 14, 2024

  • Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
  • Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
  • Add normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)
  • Version 1.0.3 release

May 11, 2024

  • Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.
model top1 top5 param_count img_size
vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k 86.202 97.874 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k 85.418 97.48 60.4 256
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k 84.322 96.812 63.95 256
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k 83.906 96.684 60.23 256
vit_base_patch16_rope_reg1_gap_256.sbb_in1k 83.866 96.67 86.43 256
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k 83.81 96.824 38.74 256
vit_betwixt_patch16_reg4_gap_256.sbb_in1k 83.706 96.616 60.4 256
vit_betwixt_patch16_reg1_gap_256.sbb_in1k 83.628 96.544 60.4 256
vit_medium_patch16_reg4_gap_256.sbb_in1k 83.47 96.622 38.88 256
vit_medium_patch16_reg1_gap_256.sbb_in1k 83.462 96.548 38.88 256
vit_little_patch16_reg4_gap_256.sbb_in1k 82.514 96.262 22.52 256
vit_wee_patch16_reg1_gap_256.sbb_in1k 80.256 95.360 13.42 256
vit_pwee_patch16_reg1_gap_256.sbb_in1k 80.072 95.136 15.25 256
vit_mediumd_patch16_reg4_gap_256.sbb_in12k N/A N/A 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k N/A N/A 60.4 256
  • AttentionExtract helper added to extract attention maps from timm models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949
  • forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.
  • 1017 of 1047 model architectures support features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.
  • Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.

April 11, 2024

  • Prepping for a long overdue 1.0 release, things have been stable for a while now.
  • Significant feature that’s been missing for a while, features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')
  • Above feature support achieved through a new forward_intermediates() API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input)
output = model.forward_head(final_feat)  # pooling + classifier head

print(final_feat.shape)
torch.Size([2, 197, 768])

for f in intermediates:
    print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])

print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))

for o in output:
    print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
  • TinyCLIP vision tower weights added, thx Thien Tran

Feb 19, 2024

  • Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
  • HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by SeeFun
  • Removed setup.py, moved to pyproject.toml based build supported by PDM
  • Add updated model EMA impl using _for_each for less overhead
  • Support device args in train script for non GPU devices
  • Other misc fixes and small additions
  • Min supported Python version increased to 3.8
  • Release 0.9.16

Jan 8, 2024

Datasets & transform refactoring

  • HuggingFace streaming (iterable) dataset support (--dataset hfids:org/dataset)
  • Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
  • Tested HF datasets and webdataset wrapper streaming from HF hub with recent timm ImageNet uploads to https://huggingface.co/timm
  • Make input & target column/field keys consistent across datasets and pass via args
  • Full monochrome support when using e:g: --input-size 1 224 224 or --in-chans 1, sets PIL image conversion appropriately in dataset
  • Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
  • Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
  • Allow train without validation set (--val-split '') in train script
  • Add --bce-sum (sum over class dim) and --bce-pos-weight (positive weighting) args for training as they’re common BCE loss tweaks I was often hard coding

Nov 23, 2023

  • Added EfficientViT-Large models, thanks SeeFun
  • Fix Python 3.7 compat, will be dropping support for it soon
  • Other misc fixes
  • Release 0.9.12

Nov 20, 2023

Nov 3, 2023

Oct 20, 2023

  • SigLIP image tower weights supported in vision_transformer.py.
    • Great potential for fine-tune and downstream feature use.
  • Experimental ‘register’ support in vit models as per Vision Transformers Need Registers
  • Updated RepViT with new weight release. Thanks wangao
  • Add patch resizing support (on pretrained weight load) to Swin models
  • 0.9.8 release pending

Sep 1, 2023

  • TinyViT added by SeeFun
  • Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
  • 0.9.7 release

Aug 28, 2023

  • Add dynamic img size support to models in vision_transformer.py, vision_transformer_hybrid.py, deit.py, and eva.py w/o breaking backward compat.
    • Add dynamic_img_size=True to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass).
    • Add dynamic_img_pad=True to allow image sizes that aren’t divisible by patch size (pad bottom right to patch size each forward pass).
    • Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
    • Existing method of resizing position embedding by passing different img_size (interpolate pretrained embed weights once) on creation still works.
    • Existing method of changing patch_size (resize pretrained patch_embed weights once) on creation still works.
    • Example validation cmd python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True

Aug 25, 2023

Aug 11, 2023

  • Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
  • Example validation cmd to test w/ non-square resize python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320

Aug 3, 2023

  • Add GluonCV weights for HRNet w18_small and w18_small_v2. Converted by SeeFun
  • Fix selecsls* model naming regression
  • Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize)
  • v0.9.5 release prep

July 27, 2023

  • Added timm trained seresnextaa201d_32x8d.sw_in12k_ft_in1k_384 weights (and .sw_in12k pretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I’m aware of.
  • RepViT model and weights (https://arxiv.org/abs/2307.09283) added by wangao
  • I-JEPA ViT feature weights (no classifier) added by SeeFun
  • SAM-ViT (segment anything) feature weights (no classifier) added by SeeFun
  • Add support for alternative feat extraction methods and -ve indices to EfficientNet
  • Add NAdamW optimizer
  • Misc fixes

May 11, 2023

  • timm 0.9 released, transition from 0.8.xdev releases

May 10, 2023

  • Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in timm
  • DINOv2 vit feature backbone weights added thanks to Leng Yue
  • FB MAE vit feature backbone weights added
  • OpenCLIP DataComp-XL L/14 feat backbone weights added
  • MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
  • Experimental get_intermediate_layers function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly… feedback welcome.
  • Model creation throws error if pretrained=True and no weights exist (instead of continuing with random initialization)
  • Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
  • bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use bnb prefix, ie bnbadam8bit
  • Misc cleanup and fixes
  • Final testing before switching to a 0.9 and bringing timm out of pre-release state

April 27, 2023

  • 97% of timm models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
  • Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.

April 21, 2023

  • Gradient accumulation support added to train script and tested (--grad-accum-steps), thanks Taeksang Kim
  • More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
  • Added --head-init-scale and --head-init-bias to train.py to scale classiifer head and set fixed bias for fine-tune
  • Remove all InplaceABN (inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).

April 12, 2023

  • Add ONNX export script, validate script, helpers that I’ve had kicking around for along time. Tweak ‘same’ padding for better export w/ recent ONNX + pytorch.
  • Refactor dropout args for vit and vit-like models, separate drop_rate into drop_rate (classifier dropout), proj_drop_rate (block mlp / out projections), pos_drop_rate (position embedding drop), attn_drop_rate (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
  • fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
  • Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param ‘enormous’ model, and 336x336 OpenAI ViT mode that was missed.

April 5, 2023

  • ALL ResNet models pushed to Hugging Face Hub with multi-weight support
  • New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
    • resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
    • resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
    • seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
    • seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

March 31, 2023

  • Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model top1 top5 img_size param_count gmacs macts
convnext_xxlarge.clip_laion2b_soup_ft_in1k 88.612 98.704 256 846.47 198.09 124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 88.312 98.578 384 200.13 101.11 126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 87.968 98.47 320 200.13 70.21 88.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 87.138 98.212 384 88.59 45.21 84.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k 86.344 97.97 256 88.59 20.09 37.55
  • Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model top1 top5 param_count img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k 90.054 99.042 305.08 448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k 89.946 99.01 305.08 448
eva_giant_patch14_560.m30m_ft_in22k_in1k 89.792 98.992 1014.45 560
eva02_large_patch14_448.mim_in22k_ft_in1k 89.626 98.954 305.08 448
eva02_large_patch14_448.mim_m38m_ft_in1k 89.57 98.918 305.08 448
eva_giant_patch14_336.m30m_ft_in22k_in1k 89.56 98.956 1013.01 336
eva_giant_patch14_336.clip_ft_in1k 89.466 98.82 1013.01 336
eva_large_patch14_336.in22k_ft_in22k_in1k 89.214 98.854 304.53 336
eva_giant_patch14_224.clip_ft_in1k 88.882 98.678 1012.56 224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k 88.692 98.722 87.12 448
eva_large_patch14_336.in22k_ft_in1k 88.652 98.722 304.53 336
eva_large_patch14_196.in22k_ft_in22k_in1k 88.592 98.656 304.14 196
eva02_base_patch14_448.mim_in22k_ft_in1k 88.23 98.564 87.12 448
eva_large_patch14_196.in22k_ft_in1k 87.934 98.504 304.14 196
eva02_small_patch14_336.mim_in22k_ft_in1k 85.74 97.614 22.13 336
eva02_tiny_patch14_336.mim_in22k_ft_in1k 80.658 95.524 5.76 336
  • Multi-weight and HF hub for DeiT and MLP-Mixer based models

March 22, 2023

  • More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
  • Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
  • FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
  • RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
  • More ImageNet-12k pretrained and 1k fine-tuned timm weights:
    • rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
    • rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
    • regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
    • regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
    • regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
  • Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added…
  • Minor bug fixes and improvements.

Feb 26, 2023

  • Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) — see model card
  • Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
  • 0.8.15dev0

Feb 20, 2023

  • Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
  • 0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

  • safetensor checkpoint support added
  • Add ideas from ‘Scaling Vision Transformers to 22 B. Params’ (https://arxiv.org/abs/2302.05442) — qk norm, RmsNorm, parallel block
  • Add F.scaleddot_product_attention support (PyTorch 2.0 only) to `vit, vit_relpos, coatnet/maxxvit` (to start)
  • Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
  • gradient checkpointing works with features_only=True

Feb 7, 2023

  • New inference benchmark numbers added in results folder.
  • Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
    • convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
    • convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
    • convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
    • convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
  • Add DaViT models. Supports features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo.
  • Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
  • Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
    • New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports features_only=True.
    • Minor updates to EfficientFormer.
    • Refactor LeViT models to stages, add features_only=True support to new conv variants, weight remap required.
  • Move ImageNet meta-data (synsets, indices) from /results to timm/data/_info.
  • Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in timm
    • Update inference.py to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
  • Ready for 0.8.10 pypi pre-release (final testing).

Jan 20, 2023

  • Add two convnext 12k -> 1k fine-tunes at 384x384

    • convnext_tiny.in12k_ft_in1k_384 - 85.1 @ 384
    • convnext_small.in12k_ft_in1k_384 - 86.2 @ 384
  • Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw base MaxViT and CoAtNet 1/2 models

model top1 top5 samples / sec Params (M) GMAC Act (M)
maxvit_xlarge_tf_512.in21k_ft_in1k 88.53 98.64 21.76 475.77 534.14 1413.22
maxvit_xlarge_tf_384.in21k_ft_in1k 88.32 98.54 42.53 475.32 292.78 668.76
maxvit_base_tf_512.in21k_ft_in1k 88.20 98.53 50.87 119.88 138.02 703.99
maxvit_large_tf_512.in21k_ft_in1k 88.04 98.40 36.42 212.33 244.75 942.15
maxvit_large_tf_384.in21k_ft_in1k 87.98 98.56 71.75 212.03 132.55 445.84
maxvit_base_tf_384.in21k_ft_in1k 87.92 98.54 104.71 119.65 73.80 332.90
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k 87.81 98.37 106.55 116.14 70.97 318.95
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k 87.47 98.37 149.49 116.09 72.98 213.74
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k 87.39 98.31 160.80 73.88 47.69 209.43
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k 86.89 98.02 375.86 116.14 23.15 92.64
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k 86.64 98.02 501.03 116.09 24.20 62.77
maxvit_base_tf_512.in1k 86.60 97.92 50.75 119.88 138.02 703.99
coatnet_2_rw_224.sw_in12k_ft_in1k 86.57 97.89 631.88 73.87 15.09 49.22
maxvit_large_tf_512.in1k 86.52 97.88 36.04 212.33 244.75 942.15
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k 86.49 97.90 620.58 73.88 15.18 54.78
maxvit_base_tf_384.in1k 86.29 97.80 101.09 119.65 73.80 332.90
maxvit_large_tf_384.in1k 86.23 97.69 70.56 212.03 132.55 445.84
maxvit_small_tf_512.in1k 86.10 97.76 88.63 69.13 67.26 383.77
maxvit_tiny_tf_512.in1k 85.67 97.58 144.25 31.05 33.49 257.59
maxvit_small_tf_384.in1k 85.54 97.46 188.35 69.02 35.87 183.65
maxvit_tiny_tf_384.in1k 85.11 97.38 293.46 30.98 17.53 123.42
maxvit_large_tf_224.in1k 84.93 96.97 247.71 211.79 43.68 127.35
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k 84.90 96.96 1025.45 41.72 8.11 40.13
maxvit_base_tf_224.in1k 84.85 96.99 358.25 119.47 24.04 95.01
maxxvit_rmlp_small_rw_256.sw_in1k 84.63 97.06 575.53 66.01 14.67 58.38
coatnet_rmlp_2_rw_224.sw_in1k 84.61 96.74 625.81 73.88 15.18 54.78
maxvit_rmlp_small_rw_224.sw_in1k 84.49 96.76 693.82 64.90 10.75 49.30
maxvit_small_tf_224.in1k 84.43 96.83 647.96 68.93 11.66 53.17
maxvit_rmlp_tiny_rw_256.sw_in1k 84.23 96.78 807.21 29.15 6.77 46.92
coatnet_1_rw_224.sw_in1k 83.62 96.38 989.59 41.72 8.04 34.60
maxvit_tiny_rw_224.sw_in1k 83.50 96.50 1100.53 29.06 5.11 33.11
maxvit_tiny_tf_224.in1k 83.41 96.59 1004.94 30.92 5.60 35.78
coatnet_rmlp_1_rw_224.sw_in1k 83.36 96.45 1093.03 41.69 7.85 35.47
maxxvitv2_nano_rw_256.sw_in1k 83.11 96.33 1276.88 23.70 6.26 23.05
maxxvit_rmlp_nano_rw_256.sw_in1k 83.03 96.34 1341.24 16.78 4.37 26.05
maxvit_rmlp_nano_rw_256.sw_in1k 82.96 96.26 1283.24 15.50 4.47 31.92
maxvit_nano_rw_256.sw_in1k 82.93 96.23 1218.17 15.45 4.46 30.28
coatnet_bn_0_rw_224.sw_in1k 82.39 96.19 1600.14 27.44 4.67 22.04
coatnet_0_rw_224.sw_in1k 82.39 95.84 1831.21 27.44 4.43 18.73
coatnet_rmlp_nano_rw_224.sw_in1k 82.05 95.87 2109.09 15.15 2.62 20.34
coatnext_nano_rw_224.sw_in1k 81.95 95.92 2525.52 14.70 2.47 12.80
coatnet_nano_rw_224.sw_in1k 81.70 95.64 2344.52 15.14 2.41 15.41
maxvit_rmlp_pico_rw_256.sw_in1k 80.53 95.21 1594.71 7.52 1.85 24.86

Jan 11, 2023

  • Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT .in12k tags)
    • convnext_nano.in12k_ft_in1k - 82.3 @ 224, 82.9 @ 288 (previously released)
    • convnext_tiny.in12k_ft_in1k - 84.2 @ 224, 84.5 @ 288
    • convnext_small.in12k_ft_in1k - 85.2 @ 224, 85.3 @ 288

Jan 6, 2023

  • Finally got around to adding --model-kwargs and --opt-kwargs to scripts to pass through rare args directly to model classes from cmd line
    • train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
    • train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
  • Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.

Jan 5, 2023

Dec 23, 2022 🎄☃

  • Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
    • NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
  • Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
  • More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
  • More ImageNet-12k (subset of 22k) pretrain models popping up:
    • efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
    • vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
    • vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
    • convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

Dec 8, 2022

  • Add ‘EVA l’ to vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
model top1 param_count gmac macts hub
eva_large_patch14_336.in22k_ft_in22k_in1k 89.2 304.5 191.1 270.2 link
eva_large_patch14_336.in22k_ft_in1k 88.7 304.5 191.1 270.2 link
eva_large_patch14_196.in22k_ft_in22k_in1k 88.6 304.1 61.6 63.5 link
eva_large_patch14_196.in22k_ft_in1k 87.9 304.1 61.6 63.5 link

Dec 6, 2022

model top1 param_count gmac macts hub
eva_giant_patch14_560.m30m_ft_in22k_in1k 89.8 1014.4 1906.8 2577.2 link
eva_giant_patch14_336.m30m_ft_in22k_in1k 89.6 1013 620.6 550.7 link
eva_giant_patch14_336.clip_ft_in1k 89.4 1013 620.6 550.7 link
eva_giant_patch14_224.clip_ft_in1k 89.1 1012.6 267.2 192.6 link

Dec 5, 2022

  • Pre-release (0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install with pip install --pre timm
    • vision_transformer, maxvit, convnext are the first three model impl w/ support
    • model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
    • bugs are likely, but I need feedback so please try it out
    • if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
  • Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use --torchcompile argument
  • Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
  • Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model top1 param_count gmac macts hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k 88.6 632.5 391 407.5 link
vit_large_patch14_clip_336.openai_ft_in12k_in1k 88.3 304.5 191.1 270.2 link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k 88.2 632 167.4 139.4 link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k 88.2 304.5 191.1 270.2 link
vit_large_patch14_clip_224.openai_ft_in12k_in1k 88.2 304.2 81.1 88.8 link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k 87.9 304.2 81.1 88.8 link
vit_large_patch14_clip_224.openai_ft_in1k 87.9 304.2 81.1 88.8 link
vit_large_patch14_clip_336.laion2b_ft_in1k 87.9 304.5 191.1 270.2 link
vit_huge_patch14_clip_224.laion2b_ft_in1k 87.6 632 167.4 139.4 link
vit_large_patch14_clip_224.laion2b_ft_in1k 87.3 304.2 81.1 88.8 link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k 87.2 86.9 55.5 101.6 link
vit_base_patch16_clip_384.openai_ft_in12k_in1k 87 86.9 55.5 101.6 link
vit_base_patch16_clip_384.laion2b_ft_in1k 86.6 86.9 55.5 101.6 link
vit_base_patch16_clip_384.openai_ft_in1k 86.2 86.9 55.5 101.6 link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k 86.2 86.6 17.6 23.9 link
vit_base_patch16_clip_224.openai_ft_in12k_in1k 85.9 86.6 17.6 23.9 link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k 85.8 88.3 17.9 23.9 link
vit_base_patch16_clip_224.laion2b_ft_in1k 85.5 86.6 17.6 23.9 link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k 85.4 88.3 13.1 16.5 link
vit_base_patch16_clip_224.openai_ft_in1k 85.3 86.6 17.6 23.9 link
vit_base_patch32_clip_384.openai_ft_in12k_in1k 85.2 88.3 13.1 16.5 link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k 83.3 88.2 4.4 5 link
vit_base_patch32_clip_224.laion2b_ft_in1k 82.6 88.2 4.4 5 link
vit_base_patch32_clip_224.openai_ft_in1k 81.9 88.2 4.4 5 link
  • Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
    • There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
model top1 param_count gmac macts hub
maxvit_xlarge_tf_512.in21k_ft_in1k 88.5 475.8 534.1 1413.2 link
maxvit_xlarge_tf_384.in21k_ft_in1k 88.3 475.3 292.8 668.8 link
maxvit_base_tf_512.in21k_ft_in1k 88.2 119.9 138 704 link
maxvit_large_tf_512.in21k_ft_in1k 88 212.3 244.8 942.2 link
maxvit_large_tf_384.in21k_ft_in1k 88 212 132.6 445.8 link
maxvit_base_tf_384.in21k_ft_in1k 87.9 119.6 73.8 332.9 link
maxvit_base_tf_512.in1k 86.6 119.9 138 704 link
maxvit_large_tf_512.in1k 86.5 212.3 244.8 942.2 link
maxvit_base_tf_384.in1k 86.3 119.6 73.8 332.9 link
maxvit_large_tf_384.in1k 86.2 212 132.6 445.8 link
maxvit_small_tf_512.in1k 86.1 69.1 67.3 383.8 link
maxvit_tiny_tf_512.in1k 85.7 31 33.5 257.6 link
maxvit_small_tf_384.in1k 85.5 69 35.9 183.6 link
maxvit_tiny_tf_384.in1k 85.1 31 17.5 123.4 link
maxvit_large_tf_224.in1k 84.9 211.8 43.7 127.4 link
maxvit_base_tf_224.in1k 84.9 119.5 24 95 link
maxvit_small_tf_224.in1k 84.4 68.9 11.7 53.2 link
maxvit_tiny_tf_224.in1k 83.4 30.9 5.6 35.8 link

Oct 15, 2022

  • Train and validation script enhancements
  • Non-GPU (ie CPU) device support
  • SLURM compatibility for train script
  • HF datasets support (via ReaderHfds)
  • TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
  • in_chans !=3 support for scripts / loader
  • Adan optimizer
  • Can enable per-step LR scheduling via args
  • Dataset ‘parsers’ renamed to ‘readers’, more descriptive of purpose
  • AMP args changed, APEX via --amp-impl apex, bfloat16 supportedf via --amp-dtype bfloat16
  • main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
  • master -> main branch rename

Oct 10, 2022

  • More weights in maxxvit series, incl first ConvNeXt block based coatnext and maxxvit experiments:
    • coatnext_nano_rw_224 - 82.0 @ 224 (G) — (uses ConvNeXt conv block, no BatchNorm)
    • maxxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
    • maxvit_rmlp_small_rw_224 - 84.5 @ 224, 85.1 @ 320 (G)
    • maxxvit_rmlp_small_rw_256 - 84.6 @ 256, 84.9 @ 288 (G) — could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
    • coatnet_rmlp_2_rw_224 - 84.6 @ 224, 85 @ 320 (T)
    • NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit — some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.

Sept 23, 2022

  • LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
    • vit_base_patch32_224_clip_laion2b
    • vit_large_patch14_224_clip_laion2b
    • vit_huge_patch14_224_clip_laion2b
    • vit_giant_patch14_224_clip_laion2b

Sept 7, 2022

  • Hugging Face timm docs home now exists, look for more here in the future
  • Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
  • Add more weights in maxxvit series incl a pico (7.5M params, 1.9 GMACs), two tiny variants:
    • maxvit_rmlp_pico_rw_256 - 80.5 @ 256, 81.3 @ 320 (T)
    • maxvit_tiny_rw_224 - 83.5 @ 224 (G)
    • maxvit_rmlp_tiny_rw_256 - 84.2 @ 256, 84.8 @ 320 (T)

Aug 29, 2022

  • MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
    • maxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.6 @ 320 (T)

Aug 26, 2022

Aug 15, 2022

  • ConvNeXt atto weights added
    • convnext_atto - 75.7 @ 224, 77.0 @ 288
    • convnext_atto_ols - 75.9 @ 224, 77.2 @ 288

Aug 5, 2022

  • More custom ConvNeXt smaller model defs with weights
    • convnext_femto - 77.5 @ 224, 78.7 @ 288
    • convnext_femto_ols - 77.9 @ 224, 78.9 @ 288
    • convnext_pico - 79.5 @ 224, 80.4 @ 288
    • convnext_pico_ols - 79.5 @ 224, 80.5 @ 288
    • convnext_nano_ols - 80.9 @ 224, 81.6 @ 288
  • Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)

July 28, 2022

  • Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks Hugo Touvron!

July 27, 2022

  • All runtime benchmark and validation result csv files are finally up-to-date!
  • A few more weights & model defs added:
    • darknetaa53 - 79.8 @ 256, 80.5 @ 288
    • convnext_nano - 80.8 @ 224, 81.5 @ 288
    • cs3sedarknet_l - 81.2 @ 256, 81.8 @ 288
    • cs3darknet_x - 81.8 @ 256, 82.2 @ 288
    • cs3sedarknet_x - 82.2 @ 256, 82.7 @ 288
    • cs3edgenet_x - 82.2 @ 256, 82.7 @ 288
    • cs3se_edgenet_x - 82.8 @ 256, 83.5 @ 320
  • cs3* weights above all trained on TPU w/ bits_and_tpu branch. Thanks to TRC program!
  • Add output_stride=8 and 16 support to ConvNeXt (dilation)
  • deit3 models not being able to resize pos_emb fixed
  • Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)

July 8, 2022

More models, more fixes

  • Official research models (w/ weights) added:
  • My own models:
    • Small ResNet defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
    • CspNet refactored with dataclass config, simplified CrossStage3 (cs3) option. These are closer to YOLO-v5+ backbone defs.
    • More relative position vit fiddling. Two srelpos (shared relative position) models trained, and a medium w/ class token.
    • Add an alternate downsample mode to EdgeNeXt and train a small model. Better than original small, but not their new USI trained weights.
  • My own model weight results (all ImageNet-1k training)
    • resnet10t - 66.5 @ 176, 68.3 @ 224
    • resnet14t - 71.3 @ 176, 72.3 @ 224
    • resnetaa50 - 80.6 @ 224 , 81.6 @ 288
    • darknet53 - 80.0 @ 256, 80.5 @ 288
    • cs3darknet_m - 77.0 @ 256, 77.6 @ 288
    • cs3darknet_focus_m - 76.7 @ 256, 77.3 @ 288
    • cs3darknet_l - 80.4 @ 256, 80.9 @ 288
    • cs3darknet_focus_l - 80.3 @ 256, 80.9 @ 288
    • vit_srelpos_small_patch16_224 - 81.1 @ 224, 82.1 @ 320
    • vit_srelpos_medium_patch16_224 - 82.3 @ 224, 83.1 @ 320
    • vit_relpos_small_patch16_cls_224 - 82.6 @ 224, 83.6 @ 320
    • edgnext_small_rw - 79.6 @ 224, 80.4 @ 320
  • cs3, darknet, and vit_*relpos weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
  • Hugging Face Hub support fixes verified, demo notebook TBA
  • Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
  • Add support to change image extensions scanned by timm datasets/readers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
  • Default ConvNeXt LayerNorm impl to use F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2) via LayerNorm2d in all cases.
    • a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
    • previous impl exists as LayerNormExp2d in models/layers/norm.py
  • Numerous bug fixes
  • Currently testing for imminent PyPi 0.6.x release
  • LeViT pretraining of larger models still a WIP, they don’t train well / easily without distillation. Time to add distill support (finally)?
  • ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) …

May 13, 2022

  • Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
  • Some refactoring for existing timm Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
  • More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
    • vit_relpos_small_patch16_224 - 81.5 @ 224, 82.5 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_relpos_medium_patch16_rpn_224 - 82.3 @ 224, 83.1 @ 320 — rel pos + res-post-norm, no class token, avg pool
    • vit_relpos_medium_patch16_224 - 82.5 @ 224, 83.3 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_relpos_base_patch16_gapcls_224 - 82.8 @ 224, 83.9 @ 320 — rel pos, layer scale, class token, avg pool (by mistake)
  • Bring 512 dim, 8-head ‘medium’ ViT model variant back to life (after using in a pre DeiT ‘small’ model for first ViT impl back in 2020)
  • Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
  • Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)

May 2, 2022

  • Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (vision_transformer_relpos.py) and Residual Post-Norm branches (from Swin-V2) (vision_transformer*.py)
    • vit_relpos_base_patch32_plus_rpn_256 - 79.5 @ 256, 80.6 @ 320 — rel pos + extended width + res-post-norm, no class token, avg pool
    • vit_relpos_base_patch16_224 - 82.5 @ 224, 83.6 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_base_patch16_rpn_224 - 82.3 @ 224 — rel pos + res-post-norm, no class token, avg pool
  • Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie How to Train Your ViT)
  • vit_* models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).

April 22, 2022

  • timm models are now officially supported in fast.ai! Just in time for the new Practical Deep Learning course. timmdocs documentation link updated to timm.fast.ai.
  • Two more model weights added in the TPU trained series. Some In22k pretrain still in progress.
    • seresnext101d_32x8d - 83.69 @ 224, 84.35 @ 288
    • seresnextaa101d_32x8d (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288

March 23, 2022

  • Add ParallelBlock and LayerScale option to base vit models to support model configs in Three things everyone should know about ViT
  • convnext_tiny_hnf (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.

March 21, 2022

  • Merge norm_norm_norm. IMPORTANT this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch 0.5.x or a previous 0.5.x release can be used if stability is required.
  • Significant weights update (all TPU trained) as described in this release
    • regnety_040 - 82.3 @ 224, 82.96 @ 288
    • regnety_064 - 83.0 @ 224, 83.65 @ 288
    • regnety_080 - 83.17 @ 224, 83.86 @ 288
    • regnetv_040 - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
    • regnetv_064 - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
    • regnetz_040 - 83.67 @ 256, 84.25 @ 320
    • regnetz_040h - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
    • resnetv2_50d_gn - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
    • resnetv2_50d_evos 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
    • regnetz_c16_evos - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
    • regnetz_d8_evos - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
    • xception41p - 82 @ 299 (timm pre-act)
    • xception65 - 83.17 @ 299
    • xception65p - 83.14 @ 299 (timm pre-act)
    • resnext101_64x4d - 82.46 @ 224, 83.16 @ 288
    • seresnext101_32x8d - 83.57 @ 224, 84.270 @ 288
    • resnetrs200 - 83.85 @ 256, 84.44 @ 320
  • HuggingFace hub support fixed w/ initial groundwork for allowing alternative ‘config sources’ for pretrained model definitions and weights (generic local file / remote url support soon)
  • SwinTransformer-V2 implementation added. Submitted by Christoph Reich. Training experiments and model changes by myself are ongoing so expect compat breaks.
  • Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
  • MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
  • PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
  • VOLO models w/ weights adapted from https://github.com/sail-sg/volo
  • Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
  • Enhance support for alternate norm + act (‘NormAct’) layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
  • Grouped conv support added to EfficientNet family
  • Add ‘group matching’ API to all models to allow grouping model parameters for application of ‘layer-wise’ LR decay, lr scale added to LR scheduler
  • Gradient checkpointing support added to many models
  • forward_head(x, pre_logits=False) fn added to all models to allow separate calls of forward_features + forward_head
  • All vision transformer and vision MLP models update to return non-pooled / non-token selected features from foward_features, for consistency with CNN models, token selection or pooling now applied in forward_head

Feb 2, 2022

  • Chris Hughes posted an exhaustive run through of timm on his blog yesterday. Well worth a read. Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide
  • I’m currently prepping to merge the norm_norm_norm branch back to master (ver 0.6.x) in next week or so.
    • The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware pip install git+https://github.com/rwightman/pytorch-image-models installs!
    • 0.5.x releases and a 0.5.x branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.

Jan 14, 2022

  • Version 0.5.4 w/ release to be pushed to pypi. It’s been a while since last pypi update and riskier changes will be merged to main branch soon…
  • Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
  • Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way…
    • mnasnet_small - 65.6 top-1
    • mobilenetv2_050 - 65.9
    • lcnet_100/075/050 - 72.1 / 68.8 / 63.1
    • semnasnet_075 - 73
    • fbnetv3_b/d/g - 79.1 / 79.7 / 82.0
  • TinyNet models added by rsomani95
  • LCNet added via MobileNetV3 architecture

Jan 5, 2023

Dec 23, 2022 🎄☃

  • Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
    • NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
  • Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
  • More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
  • More ImageNet-12k (subset of 22k) pretrain models popping up:
    • efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
    • vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
    • vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
    • convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

Dec 8, 2022

  • Add ‘EVA l’ to vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
model top1 param_count gmac macts hub
eva_large_patch14_336.in22k_ft_in22k_in1k 89.2 304.5 191.1 270.2 link
eva_large_patch14_336.in22k_ft_in1k 88.7 304.5 191.1 270.2 link
eva_large_patch14_196.in22k_ft_in22k_in1k 88.6 304.1 61.6 63.5 link
eva_large_patch14_196.in22k_ft_in1k 87.9 304.1 61.6 63.5 link

Dec 6, 2022

model top1 param_count gmac macts hub
eva_giant_patch14_560.m30m_ft_in22k_in1k 89.8 1014.4 1906.8 2577.2 link
eva_giant_patch14_336.m30m_ft_in22k_in1k 89.6 1013 620.6 550.7 link
eva_giant_patch14_336.clip_ft_in1k 89.4 1013 620.6 550.7 link
eva_giant_patch14_224.clip_ft_in1k 89.1 1012.6 267.2 192.6 link

Dec 5, 2022

  • Pre-release (0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install with pip install --pre timm
    • vision_transformer, maxvit, convnext are the first three model impl w/ support
    • model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
    • bugs are likely, but I need feedback so please try it out
    • if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
  • Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use --torchcompile argument
  • Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
  • Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model top1 param_count gmac macts hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k 88.6 632.5 391 407.5 link
vit_large_patch14_clip_336.openai_ft_in12k_in1k 88.3 304.5 191.1 270.2 link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k 88.2 632 167.4 139.4 link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k 88.2 304.5 191.1 270.2 link
vit_large_patch14_clip_224.openai_ft_in12k_in1k 88.2 304.2 81.1 88.8 link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k 87.9 304.2 81.1 88.8 link
vit_large_patch14_clip_224.openai_ft_in1k 87.9 304.2 81.1 88.8 link
vit_large_patch14_clip_336.laion2b_ft_in1k 87.9 304.5 191.1 270.2 link
vit_huge_patch14_clip_224.laion2b_ft_in1k 87.6 632 167.4 139.4 link
vit_large_patch14_clip_224.laion2b_ft_in1k 87.3 304.2 81.1 88.8 link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k 87.2 86.9 55.5 101.6 link
vit_base_patch16_clip_384.openai_ft_in12k_in1k 87 86.9 55.5 101.6 link
vit_base_patch16_clip_384.laion2b_ft_in1k 86.6 86.9 55.5 101.6 link
vit_base_patch16_clip_384.openai_ft_in1k 86.2 86.9 55.5 101.6 link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k 86.2 86.6 17.6 23.9 link
vit_base_patch16_clip_224.openai_ft_in12k_in1k 85.9 86.6 17.6 23.9 link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k 85.8 88.3 17.9 23.9 link
vit_base_patch16_clip_224.laion2b_ft_in1k 85.5 86.6 17.6 23.9 link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k 85.4 88.3 13.1 16.5 link
vit_base_patch16_clip_224.openai_ft_in1k 85.3 86.6 17.6 23.9 link
vit_base_patch32_clip_384.openai_ft_in12k_in1k 85.2 88.3 13.1 16.5 link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k 83.3 88.2 4.4 5 link
vit_base_patch32_clip_224.laion2b_ft_in1k 82.6 88.2 4.4 5 link
vit_base_patch32_clip_224.openai_ft_in1k 81.9 88.2 4.4 5 link
  • Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
    • There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
model top1 param_count gmac macts hub
maxvit_xlarge_tf_512.in21k_ft_in1k 88.5 475.8 534.1 1413.2 link
maxvit_xlarge_tf_384.in21k_ft_in1k 88.3 475.3 292.8 668.8 link
maxvit_base_tf_512.in21k_ft_in1k 88.2 119.9 138 704 link
maxvit_large_tf_512.in21k_ft_in1k 88 212.3 244.8 942.2 link
maxvit_large_tf_384.in21k_ft_in1k 88 212 132.6 445.8 link
maxvit_base_tf_384.in21k_ft_in1k 87.9 119.6 73.8 332.9 link
maxvit_base_tf_512.in1k 86.6 119.9 138 704 link
maxvit_large_tf_512.in1k 86.5 212.3 244.8 942.2 link
maxvit_base_tf_384.in1k 86.3 119.6 73.8 332.9 link
maxvit_large_tf_384.in1k 86.2 212 132.6 445.8 link
maxvit_small_tf_512.in1k 86.1 69.1 67.3 383.8 link
maxvit_tiny_tf_512.in1k 85.7 31 33.5 257.6 link
maxvit_small_tf_384.in1k 85.5 69 35.9 183.6 link
maxvit_tiny_tf_384.in1k 85.1 31 17.5 123.4 link
maxvit_large_tf_224.in1k 84.9 211.8 43.7 127.4 link
maxvit_base_tf_224.in1k 84.9 119.5 24 95 link
maxvit_small_tf_224.in1k 84.4 68.9 11.7 53.2 link
maxvit_tiny_tf_224.in1k 83.4 30.9 5.6 35.8 link

Oct 15, 2022

  • Train and validation script enhancements
  • Non-GPU (ie CPU) device support
  • SLURM compatibility for train script
  • HF datasets support (via ReaderHfds)
  • TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
  • in_chans !=3 support for scripts / loader
  • Adan optimizer
  • Can enable per-step LR scheduling via args
  • Dataset ‘parsers’ renamed to ‘readers’, more descriptive of purpose
  • AMP args changed, APEX via --amp-impl apex, bfloat16 supportedf via --amp-dtype bfloat16
  • main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
  • master -> main branch rename

Oct 10, 2022

  • More weights in maxxvit series, incl first ConvNeXt block based coatnext and maxxvit experiments:
    • coatnext_nano_rw_224 - 82.0 @ 224 (G) — (uses ConvNeXt conv block, no BatchNorm)
    • maxxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
    • maxvit_rmlp_small_rw_224 - 84.5 @ 224, 85.1 @ 320 (G)
    • maxxvit_rmlp_small_rw_256 - 84.6 @ 256, 84.9 @ 288 (G) — could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
    • coatnet_rmlp_2_rw_224 - 84.6 @ 224, 85 @ 320 (T)
    • NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit — some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.

Sept 23, 2022

  • LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
    • vit_base_patch32_224_clip_laion2b
    • vit_large_patch14_224_clip_laion2b
    • vit_huge_patch14_224_clip_laion2b
    • vit_giant_patch14_224_clip_laion2b

Sept 7, 2022

  • Hugging Face timm docs home now exists, look for more here in the future
  • Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
  • Add more weights in maxxvit series incl a pico (7.5M params, 1.9 GMACs), two tiny variants:
    • maxvit_rmlp_pico_rw_256 - 80.5 @ 256, 81.3 @ 320 (T)
    • maxvit_tiny_rw_224 - 83.5 @ 224 (G)
    • maxvit_rmlp_tiny_rw_256 - 84.2 @ 256, 84.8 @ 320 (T)

Aug 29, 2022

  • MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
    • maxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.6 @ 320 (T)

Aug 26, 2022

Aug 15, 2022

  • ConvNeXt atto weights added
    • convnext_atto - 75.7 @ 224, 77.0 @ 288
    • convnext_atto_ols - 75.9 @ 224, 77.2 @ 288

Aug 5, 2022

  • More custom ConvNeXt smaller model defs with weights
    • convnext_femto - 77.5 @ 224, 78.7 @ 288
    • convnext_femto_ols - 77.9 @ 224, 78.9 @ 288
    • convnext_pico - 79.5 @ 224, 80.4 @ 288
    • convnext_pico_ols - 79.5 @ 224, 80.5 @ 288
    • convnext_nano_ols - 80.9 @ 224, 81.6 @ 288
  • Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)

July 28, 2022

  • Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks Hugo Touvron!

July 27, 2022

  • All runtime benchmark and validation result csv files are up-to-date!
  • A few more weights & model defs added:
    • darknetaa53 - 79.8 @ 256, 80.5 @ 288
    • convnext_nano - 80.8 @ 224, 81.5 @ 288
    • cs3sedarknet_l - 81.2 @ 256, 81.8 @ 288
    • cs3darknet_x - 81.8 @ 256, 82.2 @ 288
    • cs3sedarknet_x - 82.2 @ 256, 82.7 @ 288
    • cs3edgenet_x - 82.2 @ 256, 82.7 @ 288
    • cs3se_edgenet_x - 82.8 @ 256, 83.5 @ 320
  • cs3* weights above all trained on TPU w/ bits_and_tpu branch. Thanks to TRC program!
  • Add output_stride=8 and 16 support to ConvNeXt (dilation)
  • deit3 models not being able to resize pos_emb fixed
  • Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)

July 8, 2022

More models, more fixes

  • Official research models (w/ weights) added:
  • My own models:
    • Small ResNet defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
    • CspNet refactored with dataclass config, simplified CrossStage3 (cs3) option. These are closer to YOLO-v5+ backbone defs.
    • More relative position vit fiddling. Two srelpos (shared relative position) models trained, and a medium w/ class token.
    • Add an alternate downsample mode to EdgeNeXt and train a small model. Better than original small, but not their new USI trained weights.
  • My own model weight results (all ImageNet-1k training)
    • resnet10t - 66.5 @ 176, 68.3 @ 224
    • resnet14t - 71.3 @ 176, 72.3 @ 224
    • resnetaa50 - 80.6 @ 224 , 81.6 @ 288
    • darknet53 - 80.0 @ 256, 80.5 @ 288
    • cs3darknet_m - 77.0 @ 256, 77.6 @ 288
    • cs3darknet_focus_m - 76.7 @ 256, 77.3 @ 288
    • cs3darknet_l - 80.4 @ 256, 80.9 @ 288
    • cs3darknet_focus_l - 80.3 @ 256, 80.9 @ 288
    • vit_srelpos_small_patch16_224 - 81.1 @ 224, 82.1 @ 320
    • vit_srelpos_medium_patch16_224 - 82.3 @ 224, 83.1 @ 320
    • vit_relpos_small_patch16_cls_224 - 82.6 @ 224, 83.6 @ 320
    • edgnext_small_rw - 79.6 @ 224, 80.4 @ 320
  • cs3, darknet, and vit_*relpos weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
  • Hugging Face Hub support fixes verified, demo notebook TBA
  • Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
  • Add support to change image extensions scanned by timm datasets/parsers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
  • Default ConvNeXt LayerNorm impl to use F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2) via LayerNorm2d in all cases.
    • a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
    • previous impl exists as LayerNormExp2d in models/layers/norm.py
  • Numerous bug fixes
  • Currently testing for imminent PyPi 0.6.x release
  • LeViT pretraining of larger models still a WIP, they don’t train well / easily without distillation. Time to add distill support (finally)?
  • ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) …

May 13, 2022

  • Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
  • Some refactoring for existing timm Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
  • More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
    • vit_relpos_small_patch16_224 - 81.5 @ 224, 82.5 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_relpos_medium_patch16_rpn_224 - 82.3 @ 224, 83.1 @ 320 — rel pos + res-post-norm, no class token, avg pool
    • vit_relpos_medium_patch16_224 - 82.5 @ 224, 83.3 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_relpos_base_patch16_gapcls_224 - 82.8 @ 224, 83.9 @ 320 — rel pos, layer scale, class token, avg pool (by mistake)
  • Bring 512 dim, 8-head ‘medium’ ViT model variant back to life (after using in a pre DeiT ‘small’ model for first ViT impl back in 2020)
  • Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
  • Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)

May 2, 2022

  • Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (vision_transformer_relpos.py) and Residual Post-Norm branches (from Swin-V2) (vision_transformer*.py)
    • vit_relpos_base_patch32_plus_rpn_256 - 79.5 @ 256, 80.6 @ 320 — rel pos + extended width + res-post-norm, no class token, avg pool
    • vit_relpos_base_patch16_224 - 82.5 @ 224, 83.6 @ 320 — rel pos, layer scale, no class token, avg pool
    • vit_base_patch16_rpn_224 - 82.3 @ 224 — rel pos + res-post-norm, no class token, avg pool
  • Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie How to Train Your ViT)
  • vit_* models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).

April 22, 2022

  • timm models are now officially supported in fast.ai! Just in time for the new Practical Deep Learning course. timmdocs documentation link updated to timm.fast.ai.
  • Two more model weights added in the TPU trained series. Some In22k pretrain still in progress.
    • seresnext101d_32x8d - 83.69 @ 224, 84.35 @ 288
    • seresnextaa101d_32x8d (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288

March 23, 2022

  • Add ParallelBlock and LayerScale option to base vit models to support model configs in Three things everyone should know about ViT
  • convnext_tiny_hnf (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.

March 21, 2022

  • Merge norm_norm_norm. IMPORTANT this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch 0.5.x or a previous 0.5.x release can be used if stability is required.
  • Significant weights update (all TPU trained) as described in this release
    • regnety_040 - 82.3 @ 224, 82.96 @ 288
    • regnety_064 - 83.0 @ 224, 83.65 @ 288
    • regnety_080 - 83.17 @ 224, 83.86 @ 288
    • regnetv_040 - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
    • regnetv_064 - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
    • regnetz_040 - 83.67 @ 256, 84.25 @ 320
    • regnetz_040h - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
    • resnetv2_50d_gn - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
    • resnetv2_50d_evos 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
    • regnetz_c16_evos - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
    • regnetz_d8_evos - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
    • xception41p - 82 @ 299 (timm pre-act)
    • xception65 - 83.17 @ 299
    • xception65p - 83.14 @ 299 (timm pre-act)
    • resnext101_64x4d - 82.46 @ 224, 83.16 @ 288
    • seresnext101_32x8d - 83.57 @ 224, 84.270 @ 288
    • resnetrs200 - 83.85 @ 256, 84.44 @ 320
  • HuggingFace hub support fixed w/ initial groundwork for allowing alternative ‘config sources’ for pretrained model definitions and weights (generic local file / remote url support soon)
  • SwinTransformer-V2 implementation added. Submitted by Christoph Reich. Training experiments and model changes by myself are ongoing so expect compat breaks.
  • Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
  • MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
  • PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
  • VOLO models w/ weights adapted from https://github.com/sail-sg/volo
  • Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
  • Enhance support for alternate norm + act (‘NormAct’) layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
  • Grouped conv support added to EfficientNet family
  • Add ‘group matching’ API to all models to allow grouping model parameters for application of ‘layer-wise’ LR decay, lr scale added to LR scheduler
  • Gradient checkpointing support added to many models
  • forward_head(x, pre_logits=False) fn added to all models to allow separate calls of forward_features + forward_head
  • All vision transformer and vision MLP models update to return non-pooled / non-token selected features from foward_features, for consistency with CNN models, token selection or pooling now applied in forward_head

Feb 2, 2022

  • Chris Hughes posted an exhaustive run through of timm on his blog yesterday. Well worth a read. Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide
  • I’m currently prepping to merge the norm_norm_norm branch back to master (ver 0.6.x) in next week or so.
    • The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware pip install git+https://github.com/rwightman/pytorch-image-models installs!
    • 0.5.x releases and a 0.5.x branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.

Jan 14, 2022

  • Version 0.5.4 w/ release to be pushed to pypi. It’s been a while since last pypi update and riskier changes will be merged to main branch soon…
  • Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
  • Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way…
    • mnasnet_small - 65.6 top-1
    • mobilenetv2_050 - 65.9
    • lcnet_100/075/050 - 72.1 / 68.8 / 63.1
    • semnasnet_075 - 73
    • fbnetv3_b/d/g - 79.1 / 79.7 / 82.0
  • TinyNet models added by rsomani95
  • LCNet added via MobileNetV3 architecture
< > Update on GitHub