license: mit
Uno Card Classifier
This is a simple model designed to recognize Uno cards from images. It was trained using a limited dataset, but employs some creative techniques to achieve decent results. Here's a breakdown of what makes this model tick:
Model Description
The model is a custom classifier that leverages the power of a pre-trained CLIP (Contrastive Language-Image Pre-training) vision model. Specifically:
- Backbone: The model uses the
CLIPVisionModel
fromopenai/clip-vit-large-patch14
, a powerful image encoder pre-trained on a massive dataset of images and text. This allows us to benefit from learned features that generalize well. - Classifier Head: On top of the CLIP vision model, we add a linear layer (
nn.Linear
) that projects the CLIP embeddings to the number of classes we want to predict (each Uno card type, plus "NO_CARD"). - Dropout: A dropout layer (
nn.Dropout
) is added to prevent overfitting and increase robustness.
How It Was Trained with Limited Data
Training a model like this with just a few card images is challenging. Here's how i tackled this:
Data Augmentation: We generate a lot of images on-the-fly to create training examples by:
- Background Overlay: Cards are pasted onto random background images from the
natural-images
dataset. - Perspective Transforms: The cards are perspective-warped to simulate different viewing angles.
- Rotations: Cards are rotated to introduce variety in orientations.
- Random Scaling and Positioning: The card's size and position on the background are randomized.
- Background Only Images: 25% of the time, the model is trained on just background images. This helps with differentiating when there's no card.
- Background Overlay: Cards are pasted onto random background images from the
Pre-trained Features: CLIP is trained on massive amounts of images and text, making it really good at extracting high quality features. Instead of training from scratch, we fine tune the entire visual model on the dataset using a very low LR.
"NO_CARD" Class: This helps the model understand when it's looking at just a background, not a card.
Training Details
- Dataset:
- A small collection of Uno card images with a file name for the type of card.
- Background images from the
natural-images
dataset. - A custom PyTorch dataset class
AugmentedUnoCardDataset
handles the loading, data augmentation and label creation.
- Preprocessing: The images are processed using the
CLIPProcessor
to convert them into the input format that the CLIP model expects, specifically pixel values. - Optimizer: AdamW is used to train the network and the cross entropy loss (
nn.CrossEntropyLoss
) for classification is used. - Training:
- The model is generally trained for 50 epochs. (Reliable enough to counter tons of different states of uno cards) But this one has trained for 130 epochs.
- We use a batch size of 4.
- A low learning rate of 1e-5 is used to avoid instability. (Super important! While it is low, it learns extremely well)
Architecture Details
- The model was trained using a CLIP Vision Model as a base.
- The model adds a linear layer on top to allow classification.
- A dropout layer helps with overfitting.
Usage
To use this model you will need the pytorch and transformers libraries:
pip install torch transformers torchvision
Then go to the files of this HF page and simply use model.py in your python scripts, and that is it. The current model.py will get an image from the internet, perform the processing, then make a heatmap of the attention (what the model cares about) and return its guess and confidence.
Limitations
- Limited Dataset: The model was trained with a small dataset and may not be robust to unseen variations. (56 cards only, but due to augmentation, it is quite robust.)
- Not that good: It trained for just about 130 epochs (about 5 minutes), but it still isn't enough, since it hasn't seen that many augmentations
Datasets:
- Natural Images - https://www.kaggle.com/datasets/prasunroy/natural-images - For projecting images onto random backgrounds
- Uno Cards - https://www.kaggle.com/datasets/vatsalparsaniya/uno-cards - For the cards obviously.
Conclusion
This is a basic attempt at creating an uno card recognizer, and is only made for fun and for learning experience. The model can only recognize one card at a time, and understands wild cards the best because of their ease of style.
- Downloads last month
- 30
Model tree for Hayloo9838/uno-recognizer
Base model
openai/clip-vit-large-patch14