Model Card for Model ID

This model is a fine-tuned EfficientNet-B0 Convolutional Neural Network (CNN) designed to recognize hand-drawn letters (A-Z) for a virtual board application. Integrated with OpenCV and MediaPipe for real-time hand tracking, it powers an interactive canvas for letter and word prediction, achieving a hypothetical validation accuracy of 99%. The model is trained on the pittawat/letter_recognition dataset and supports educational and communication use cases with voice feedback via Tesseract OCR.

Model Description

The Virtual Board CNN is a fine-tuned EfficientNet-B0 model for classifying hand-drawn letters (A-Z) in real-time. Built using PyTorch, it processes grayscale images (224x224) from a virtual canvas, enabling gesture-based drawing and prediction. The model is part of an interactive application that combines computer vision (OpenCV, MediaPipe) and deep learning for educational and communication purposes, with word prediction enhanced by Tesseract OCR and text-to-speech output.

  • Developed by: Gokul Seetharaman
  • Model type: Convolutional Neural Network
  • License: MIT
  • Finetuned from model EfficientB0

Model Sources [optional]

Uses

The model is intended for direct use within the virtual board application, where it predicts hand-drawn letters (A-Z) from webcam-captured canvas images. Users draw letters using hand gestures, and the model outputs predictions in real-time, displayed on the interface with confidence scores.

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

Bias: The model was trained on the pittawat/letter_recognition dataset, which may not capture all handwriting styles or variations across demographics, potentially leading to lower accuracy for underrepresented writing patterns.

Risks: Incorrect letter predictions could mislead users in educational or communication contexts. Word prediction via Tesseract OCR may fail for poorly drawn or complex words.

Lmitations: Hypothetical 99% validation accuracy is unverified without a formal evaluation script. Performance depends on webcam quality (min. 720p recommended) and clear canvas inputs. Grayscale input limits applicability to color-based tasks. Tesseract OCR’s word prediction may struggle with cursive or overlapping text.

Recommendations

Users should: Verify model performance with a validation script (e.g., validation-checker.py) on diverse handwriting samples. Ensure high-quality webcam input and clear canvas drawings for optimal results. Be aware of potential biases in the dataset and test with varied handwriting styles. Consider fine-tuning for specific use cases or hardware constraints.

How to Get Started with the Model

Download best_model.pth and main.py from this repo and GitHub. Run python main.py for webcam.

Training Data

Training Procedure

  • Finetunded EfficientB0
  • CrossEntropyLoss, AdamW optimizer, 25 epochs, batch size 32

Preprocessing [optional]

  • Images resized to 224x224
  • Normalized with ImageNet means/std
  • Random data augmentation on train set

Training Hyperparameters

  • Training regime: fp32
  • Epochs: 25, batch size: 32, optimizer: AdamW, LR: 5e-4

Speeds, Sizes, Times [optional]

  • Training time: ~90 minutes on a modern GPU (varies)
  • Checkpoint size: ~46MB (best_model.pth)

Factors

  • Performance measured per-class (precision, recall, F1-score, support)

Metrics

  • Overall accuracy, confusion matrix, precision/recall/F1-score per class

Results

  • Validation accuracy: 99.04
  • Full confusion matrix and metrics in GitHub README

Environmental Impact

  • Estimated training: <1.5 GPU-hour, carbon footprint minimal for local or single-GPU cloud runs
  • Hardware: NVIDIA GeForce GTX 4060 Laptop GPU
  • Hours used: ~1.5

Model Architecture and Objective

  • See "Model Details" and GitHub repo for the full PyTorch code.

Compute Infrastructure

  • Finetuning the EfficientB0 model with NVIDIA GTX 4060 Laptop GPU, 8GB VRAM, 16GB RAM, Windows 11, Python 3.10

Hardware

  • GPU: GTX 4060 (or equivalent, optional CPU)
  • RAM: 16GB

Software

  • Python 3.10, PyTorch, OpenCV, NumPy, mediapipe, pyttsx3

Citation

BibTeX:

@misc{gokulseetharaman2025wastecnn,
  title={Virtual-Drawing-Board-Opencv-pytorch},
  author={Gokul Seetharaman},
  year={2025},
  url={https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch}
}

APA: Gokul Seetharaman. (2025). Virtual-Drawing-board-Opencv-Pytorch. https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch

Model Card Contact

GitHub Issues

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for itsgokul02/Virtual_Board

Finetuned
(21)
this model

Dataset used to train itsgokul02/Virtual_Board