Kansallisarkisto/empty-tablecell-detection

Table cell classification

The model is trained to classify table cell images as either empty or not empty. It has been trained using table cell images from Finnish census and death record tables from the 1930s.

The model has been trained using densenet121 as the base model, and it has been transformed into the onnx format.

Intended uses & limitations

The model has been trained to classify table cells from specific kinds of tables, which contain mainly handwritten text. It has not been tested with other type of table cell data.

Training and validation data

Training dataset consisted of

empty cell images: 2943
non-empty cell images: 5033

Validation dataset consisted of

empty cell images: 367
non-empty cell images: 627

Training procedure

The code used for model training is available in the repository in train.py file, which uses functions from augment.py and utils.py files. The required libraries are listed in the requirements.txt file.

The model was trained using cpu with the following hyperparameters:

image size: 2560
learning rate: 0.0001
train batch size: 32
epochs: 15
patience: 3 epochs
optimizer: Adam

Evaluation results

Evaluation results using the validation dataset are listed below:

Validation loss	Validation accuracy	Validation F1-score
0.0427	0.9899	0.9903

Inference

Inference can be performed using the code in the test.py file.