---
license: gpl-3.0
language:
- en
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- digits
- cnn
- mnist
- emnist
- pytorch
- handwriting-recognition
- onnx
---
# Digit & Blank Image Classifier (PyTorch CNN)

A high-accuracy convolutional neural network trained to classify handwritten digits from the **MNIST** and **EMNIST Digits** datasets, and additionally detect **blank images** (unfilled boxes) as a distinct class. This model is trained using PyTorch and exported in TorchScript format (`.pt`) for reliable and portable inference.

---

## License & Attribution

This model is licensed under the **AGPL-3.0** license to comply with the [Plom Project](https://gitlab.com/plom/plom) licensing requirements.

### Developed as part of the Plom Project

**Authors & Credits**:
- Model: **Deep Shah**, Undergraduate Research Assistant, UBC
- Supervision: **Prof. Andrew Rechnitzer** and **Prof. Colin B. MacDonald**
- Project: [The Plom Project GitLab](https://gitlab.com/plom/plom)

---

## Overview

- **Input**: 1×28×28 grayscale image
- **Output**: Integer class prediction:
  - 0–9: Digits
  - 10: Blank image
- **Architecture**: 3-layer CNN with BatchNorm, ReLU, MaxPooling, Dropout, Fully Connected Layers
- **Model Format**: TorchScript (`.pt`), ONNX (`.onnx`)
- **Training Dataset**: Combined MNIST, EMNIST Digits, and 5000 synthetic blank images

---

## Dataset Details

### Datasets Used:

-  **MNIST** – 28×28 handwritten digits (0–9), 60,000 training images
-  **EMNIST Digits** – 28×28 digits extracted from handwritten characters, 240,000+ training samples
-  **Blank Images** – 5,000 synthetic all-black 28×28 images, labeled as class `10` to simulate unfilled regions

### Preprocessing:

- Normalized pixel values to [0, 1]
- Converted images to channel-first format (N, C, H, W)
- Combined and shuffled datasets

---

## Data Augmentation

To improve generalization and robustness to handwriting variation:

- `RandomRotation(±10°)`
- `RandomAffine`: scale (0.9–1.1), translate (±10%)

These transformations simulate handwritten noise and variation in real student submissions.

---

## Model Architecture

```
Input: (1, 28, 28)
↓ Conv2D(1 → 32) + BatchNorm + ReLU
↓ Conv2D(32 → 64) + BatchNorm + ReLU
↓ MaxPool2d(2x2) + Dropout(0.1)
↓ Conv2D(64 → 128) + BatchNorm + ReLU
↓ MaxPool2d(2x2) + Dropout(0.1)
↓ Flatten
↓ Linear(128*7*7 → 128) + BatchNorm + ReLU + Dropout(0.2)
↓ Linear(128 → 11)
→ Output: class logits (digits 0–9, blank = 10)
```

---

## Training Configuration

| Hyperparameter | Value               |
| -------------- | ------------------- |
| Optimizer      | Adam (lr=0.001)     |
| Loss Function  | CrossEntropyLoss    |
| Scheduler      | ReduceLROnPlateau   |
| Early Stopping | Patience = 5        |
| Epochs         | Max 50              |
| Batch Size     | 64                  |
| Device         | CPU or CUDA         |
| Random Seed    | 42                  |

---

## Evaluation Results

| Metric               | Value     |
| -------------------- | --------- |
| Test Accuracy        | 99.73%    |
| Blank Image Accuracy | 100.00%   |

All 5,000 blank images were correctly classified.

---

## Inference Examples

### 1. TorchScript (PyTorch)

```python
import torch

# Load TorchScript model
model = torch.jit.load("mnist_emnist_blank_cnn_v1.pt")
model.eval()

# Dummy input (1 image, 1 channel, 28x28)
img = torch.randn(1, 1, 28, 28)

# Predict
with torch.no_grad():
    out = model(img)
    predicted = out.argmax(dim=1).item()

print("Predicted class:", predicted)
```

### 2. ONNX (ONNX Runtime)
```python
import onnxruntime as ort
import numpy as np

# Load ONNX model
session = ort.InferenceSession("mnist_emnist_blank_cnn_v1.onnx", providers=["CPUExecutionProvider"])

# Dummy input
img = np.random.randn(1, 1, 28, 28).astype(np.float32)

# Predict
outputs = session.run(None, {"input": img})
predicted = int(outputs[0].argmax(axis=1)[0])

print("Predicted class:", predicted)
```


> If the prediction is `10`, the model considers the image to be blank (no digits present).

---

## Included Files

- `train_digit_classifier.py`: Training script with full documentation
- `mnist_emnist_blank_cnn_v1.pth`: Final trained model weights
- `mnist_emnist_blank_cnn_v1.pt`: TorchScript export for deployment
- `mnist_emnist_blank_cnn_v1.onnx`: ONNX export for deployment
- `requirements.txt`: Required dependencies for training or inference

---

## Intended Use

This model was designed to support the Plom Project’s student ID digit detection system, helping automatically identify handwritten digits (and detect blank/unfilled boxes) from scanned exam sheets.

It may also be adapted for other handwritten digit classification tasks or real-time blank field detection applications.

<!-- ---

## Maintainer & Contact

- **Deep Shah** — [Hugging Face Profile](https://huggingface.co/deepshah23)
- For Plom inquiries: [The Plom Project GitLab](https://gitlab.com/plom/plom) -->