vit-90-animals


Model description

This model is a fine-tuned Vision Transformer version of google/vit-base-patch16-224 on the animal image dataset from kaggle - trained to classify images into 90 different animal species. It achieves high accuracy on unseen data and was trained using supervised learning. The model can be used for general-purpose image classification in the animal domain and serves as a comparison baseline for zero-shot classification models such as CLIP.

The model achieves the following results on the evaluation set:

  • Loss: 0.0840
  • Accuracy: 0.9796

Intended uses & limitations

Intended uses

  • Animal image classification (educational, demo, prototyping)
  • Benchmarking against zero-shot classification models
  • Use in Gradio interfaces or image analysis tools

Limitations

  • The model is limited to the 90 animal classes it was trained on
  • It may not generalize well to image domains outside of its training distribution
  • Performance can degrade with poor image quality or occlusions

Training and evaluation data

The model was trained on a dataset containing 5,400 animal images categorized into 90 distinct classes. The dataset was obtained from Kaggle and according to the creator originally sourced from Google Images. The training/validation/test split was 80/10/10, and the label distribution is relatively balanced across classes.

Evaluation was conducted on the test split and compared to results from a zero-shot model (openai/clip-vit-large-patch14) using the same label set.

Training procedure

  • Base model: google/vit-base-patch16-224
  • Fine-tuning method: Supervised training using the Hugging Face Trainer class
  • Data augmentation: Applied during training (e.g., RandomHorizontalFlip, ColorJitter)
  • Training time: ~5 epochs with and without augmentation
  • Optimizer: AdamW (default settings)
  • Evaluation metrics: Accuracy, precision, and recall
  • Best performance (no augmentation): 98.3% test accuracy

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.2021 1.0 270 0.3500 0.9611
0.2978 2.0 540 0.1766 0.9685
0.1886 3.0 810 0.1500 0.9685
0.1706 4.0 1080 0.1409 0.9685
0.1678 5.0 1350 0.1373 0.9667

Framework versions

  • Transformers 4.50.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
34
Safetensors
Model size
85.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for maceythm/vit-90-animals

Finetuned
(835)
this model

Space using maceythm/vit-90-animals 1

Evaluation results

  • Accuracy on iamsouravbanerjee/animal-image-dataset-90-different-animals
    self-reported
    0.980