Model Card for Model ID

This model is a fine-tuned EfficientNet-B0 Convolutional Neural Network (CNN) designed to recognize hand-drawn letters (A-Z) for a virtual board application. Integrated with OpenCV and MediaPipe for real-time hand tracking, it powers an interactive canvas for letter and word prediction, achieving a hypothetical validation accuracy of 99%. The model is trained on the pittawat/letter_recognition dataset and supports educational and communication use cases with voice feedback via Tesseract OCR.

Model Description

The Virtual Board CNN is a fine-tuned EfficientNet-B0 model for classifying hand-drawn letters (A-Z) in real-time. Built using PyTorch, it processes grayscale images (224x224) from a virtual canvas, enabling gesture-based drawing and prediction. The model is part of an interactive application that combines computer vision (OpenCV, MediaPipe) and deep learning for educational and communication purposes, with word prediction enhanced by Tesseract OCR and text-to-speech output.

Developed by: Gokul Seetharaman
Model type: Convolutional Neural Network
License: MIT
Finetuned from model EfficientB0

Model Sources [optional]

Repository: https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch
Dataset: https://huggingface.co/datasets/pittawat/letter_recognition

Uses

The model is intended for direct use within the virtual board application, where it predicts hand-drawn letters (A-Z) from webcam-captured canvas images. Users draw letters using hand gestures, and the model outputs predictions in real-time, displayed on the interface with confidence scores.

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

Bias: The model was trained on the pittawat/letter_recognition dataset, which may not capture all handwriting styles or variations across demographics, potentially leading to lower accuracy for underrepresented writing patterns.

Risks: Incorrect letter predictions could mislead users in educational or communication contexts. Word prediction via Tesseract OCR may fail for poorly drawn or complex words.

Lmitations: Hypothetical 99% validation accuracy is unverified without a formal evaluation script. Performance depends on webcam quality (min. 720p recommended) and clear canvas inputs. Grayscale input limits applicability to color-based tasks. Tesseract OCR’s word prediction may struggle with cursive or overlapping text.

Recommendations

Users should: Verify model performance with a validation script (e.g., validation-checker.py) on diverse handwriting samples. Ensure high-quality webcam input and clear canvas drawings for optimal results. Be aware of potential biases in the dataset and test with varied handwriting styles. Consider fine-tuning for specific use cases or hardware constraints.

How to Get Started with the Model

Download best_model.pth and main.py from this repo and GitHub. Run python main.py for webcam.

Training Data

Huggingface letter recognition dataset
26 classes (split 80/20 train/val)

Training Procedure

Finetunded EfficientB0
CrossEntropyLoss, AdamW optimizer, 25 epochs, batch size 32

Preprocessing [optional]

Images resized to 224x224
Normalized with ImageNet means/std
Random data augmentation on train set

Training Hyperparameters

Training regime: fp32
Epochs: 25, batch size: 32, optimizer: AdamW, LR: 5e-4

Speeds, Sizes, Times [optional]

Training time: ~90 minutes on a modern GPU (varies)
Checkpoint size: ~46MB (best_model.pth)

Factors

Performance measured per-class (precision, recall, F1-score, support)

Metrics

Overall accuracy, confusion matrix, precision/recall/F1-score per class

Results

Validation accuracy: 99.04
Full confusion matrix and metrics in GitHub README

Environmental Impact

Estimated training: <1.5 GPU-hour, carbon footprint minimal for local or single-GPU cloud runs
Hardware: NVIDIA GeForce GTX 4060 Laptop GPU
Hours used: ~1.5

Model Architecture and Objective

See "Model Details" and GitHub repo for the full PyTorch code.

Compute Infrastructure

Finetuning the EfficientB0 model with NVIDIA GTX 4060 Laptop GPU, 8GB VRAM, 16GB RAM, Windows 11, Python 3.10

Hardware

GPU: GTX 4060 (or equivalent, optional CPU)
RAM: 16GB

Software

Python 3.10, PyTorch, OpenCV, NumPy, mediapipe, pyttsx3

Citation

BibTeX:

@misc{gokulseetharaman2025wastecnn,
  title={Virtual-Drawing-Board-Opencv-pytorch},
  author={Gokul Seetharaman},
  year={2025},
  url={https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch}
}

APA: Gokul Seetharaman. (2025). Virtual-Drawing-board-Opencv-Pytorch. https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch

Model Card Contact

GitHub Issues

itsgokul02
/

Virtual_Board

Model Card for Model ID

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Factors

Metrics

Results

Environmental Impact

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation

Model Card Contact

Model tree for itsgokul02/Virtual_Board

Dataset used to train itsgokul02/Virtual_Board