ViT Fine-tuned on Stanford Car Dataset

Base model: https://huggingface.co/google/vit-base-patch16-224

This achieves around 86% on the testing set, you can use it as a baseline for further tuning.

Dataset Description

The Stanford car dataset contains 16,185 images of 196 classes of cars. Classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe. The data is split into 8144 training images, 6,041 testing images, and 2000 validation images in this case.

** Please note: this dataset does not contain newer car models **

Using the Model in the Transformer Library

from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("therealcyberlord/stanford-car-vit-patch16")
model = AutoModelForImageClassification.from_pretrained("therealcyberlord/stanford-car-vit-patch16")

Citations

3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.

Downloads last month
518
Safetensors
Model size
85.9M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using therealcyberlord/stanford-car-vit-patch16 1