license: mit

Uno Card Classifier

This is a simple model designed to recognize Uno cards from images. It was trained using a limited dataset, but employs some creative techniques to achieve decent results. Here's a breakdown of what makes this model tick:

Model Description

The model is a custom classifier that leverages the power of a pre-trained CLIP (Contrastive Language-Image Pre-training) vision model. Specifically:

Backbone: The model uses the CLIPVisionModel from openai/clip-vit-large-patch14, a powerful image encoder pre-trained on a massive dataset of images and text. This allows us to benefit from learned features that generalize well.
Classifier Head: On top of the CLIP vision model, we add a linear layer (nn.Linear) that projects the CLIP embeddings to the number of classes we want to predict (each Uno card type, plus "NO_CARD").
Dropout: A dropout layer (nn.Dropout) is added to prevent overfitting and increase robustness.

How It Was Trained with Limited Data

Training a model like this with just a few card images is challenging. Here's how i tackled this:

Data Augmentation: We generate a lot of images on-the-fly to create training examples by:
- Background Overlay: Cards are pasted onto random background images from the natural-images dataset.
- Perspective Transforms: The cards are perspective-warped to simulate different viewing angles.
- Rotations: Cards are rotated to introduce variety in orientations.
- Random Scaling and Positioning: The card's size and position on the background are randomized.
- Background Only Images: 25% of the time, the model is trained on just background images. This helps with differentiating when there's no card.
Pre-trained Features: CLIP is trained on massive amounts of images and text, making it really good at extracting high quality features. Instead of training from scratch, we fine tune the entire visual model on the dataset using a very low LR.
"NO_CARD" Class: This helps the model understand when it's looking at just a background, not a card.

Training Details

Dataset:
- A small collection of Uno card images with a file name for the type of card.
- Background images from the natural-images dataset.
- A custom PyTorch dataset class AugmentedUnoCardDataset handles the loading, data augmentation and label creation.
Preprocessing: The images are processed using the CLIPProcessor to convert them into the input format that the CLIP model expects, specifically pixel values.
Optimizer: AdamW is used to train the network and the cross entropy loss (nn.CrossEntropyLoss) for classification is used.
Training:
- The model is generally trained for 50 epochs. (Reliable enough to counter tons of different states of uno cards) But this one has trained for 130 epochs.
- We use a batch size of 4.
- A low learning rate of 1e-5 is used to avoid instability. (Super important! While it is low, it learns extremely well)

Architecture Details

The model was trained using a CLIP Vision Model as a base.
The model adds a linear layer on top to allow classification.
A dropout layer helps with overfitting.

Usage

To use this model you will need the pytorch and transformers libraries:

pip install torch transformers torchvision

Then go to the files of this HF page and simply use model.py in your python scripts, and that is it. The current model.py will get an image from the internet, perform the processing, then make a heatmap of the attention (what the model cares about) and return its guess and confidence.

Limitations

Limited Dataset: The model was trained with a small dataset and may not be robust to unseen variations. (56 cards only, but due to augmentation, it is quite robust.)
Not that good: It trained for just about 130 epochs (about 5 minutes), but it still isn't enough, since it hasn't seen that many augmentations

Datasets:

Natural Images - https://www.kaggle.com/datasets/prasunroy/natural-images - For projecting images onto random backgrounds
Uno Cards - https://www.kaggle.com/datasets/vatsalparsaniya/uno-cards - For the cards obviously.

Conclusion

This is a basic attempt at creating an uno card recognizer, and is only made for fun and for learning experience. The model can only recognize one card at a time, and understands wild cards the best because of their ease of style.

Hayloo9838
/

uno-recognizer