cmaeti/mnist-i-jepa · Hugging Face

Model Name: MNIST I-JEPA Classifier
Model Type: Convolutional Neural Network with I-JEPA feature extraction
Dataset: MNIST (Modified National Institute of Standards and Technology)
- A dataset of 70,000 handwritten digits (28x28 grayscale images) of digits 0-9.
Task: Image classification (Digit recognition)
Framework: PyTorch
Preprocessing:
- Images are resized to 28x28 pixels, converted to grayscale (if necessary), and normalized before feeding into the model.

Input Layer: 28x28 grayscale images.
Convolutional Layers:
- Conv1: 32 filters of size 3x3, applied to the input image (1 channel).
- Conv2: 64 filters of size 3x3, applied to the output of Conv1.
Activation Functions:
- ReLU activations after each convolutional layer.
Pooling Layers:
- Max pooling with a 2x2 window after Conv1 and Conv2 to downsample.
Fully Connected Layers:
- Flattened output from the convolutional layers and passed to a fully connected layer with feature_dim (default 128) neurons.
- Final fully connected layer outputs 10 units, corresponding to the 10 possible digit classes (0-9).
Output:
- Softmax activation is used on the final output layer to produce class probabilities.

How to Use:

Dependencies:

pip install torch torchvision matplotlib pillow

This model may not perform well on images that differ greatly from the MNIST dataset, such as noisy or distorted digits.
The model is optimized for recognizing digits from the MNIST dataset and may not generalize well to other types of handwritten digits or more complex images.