Model Card for Model ID
This model is a fine-tuned EfficientNet-B0 Convolutional Neural Network (CNN) designed to recognize hand-drawn letters (A-Z) for a virtual board application. Integrated with OpenCV and MediaPipe for real-time hand tracking, it powers an interactive canvas for letter and word prediction, achieving a hypothetical validation accuracy of 99%. The model is trained on the pittawat/letter_recognition dataset and supports educational and communication use cases with voice feedback via Tesseract OCR.
Model Description
The Virtual Board CNN is a fine-tuned EfficientNet-B0 model for classifying hand-drawn letters (A-Z) in real-time. Built using PyTorch, it processes grayscale images (224x224) from a virtual canvas, enabling gesture-based drawing and prediction. The model is part of an interactive application that combines computer vision (OpenCV, MediaPipe) and deep learning for educational and communication purposes, with word prediction enhanced by Tesseract OCR and text-to-speech output.
- Developed by: Gokul Seetharaman
- Model type: Convolutional Neural Network
- License: MIT
- Finetuned from model EfficientB0
Model Sources [optional]
- Repository: https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch
- Dataset: https://huggingface.co/datasets/pittawat/letter_recognition
Uses
The model is intended for direct use within the virtual board application, where it predicts hand-drawn letters (A-Z) from webcam-captured canvas images. Users draw letters using hand gestures, and the model outputs predictions in real-time, displayed on the interface with confidence scores.
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
Bias: The model was trained on the pittawat/letter_recognition dataset, which may not capture all handwriting styles or variations across demographics, potentially leading to lower accuracy for underrepresented writing patterns.
Risks: Incorrect letter predictions could mislead users in educational or communication contexts. Word prediction via Tesseract OCR may fail for poorly drawn or complex words.
Lmitations: Hypothetical 99% validation accuracy is unverified without a formal evaluation script. Performance depends on webcam quality (min. 720p recommended) and clear canvas inputs. Grayscale input limits applicability to color-based tasks. Tesseract OCR’s word prediction may struggle with cursive or overlapping text.
Recommendations
Users should: Verify model performance with a validation script (e.g., validation-checker.py) on diverse handwriting samples. Ensure high-quality webcam input and clear canvas drawings for optimal results. Be aware of potential biases in the dataset and test with varied handwriting styles. Consider fine-tuning for specific use cases or hardware constraints.
How to Get Started with the Model
Download best_model.pth and main.py from this repo and GitHub. Run python main.py for webcam.
Training Data
- Huggingface letter recognition dataset
- 26 classes (split 80/20 train/val)
Training Procedure
- Finetunded EfficientB0
- CrossEntropyLoss, AdamW optimizer, 25 epochs, batch size 32
Preprocessing [optional]
- Images resized to 224x224
- Normalized with ImageNet means/std
- Random data augmentation on train set
Training Hyperparameters
- Training regime: fp32
- Epochs: 25, batch size: 32, optimizer: AdamW, LR: 5e-4
Speeds, Sizes, Times [optional]
- Training time: ~90 minutes on a modern GPU (varies)
- Checkpoint size: ~46MB (
best_model.pth
)
Factors
- Performance measured per-class (precision, recall, F1-score, support)
Metrics
- Overall accuracy, confusion matrix, precision/recall/F1-score per class
Results
- Validation accuracy: 99.04
- Full confusion matrix and metrics in GitHub README
Environmental Impact
- Estimated training: <1.5 GPU-hour, carbon footprint minimal for local or single-GPU cloud runs
- Hardware: NVIDIA GeForce GTX 4060 Laptop GPU
- Hours used: ~1.5
Model Architecture and Objective
- See "Model Details" and GitHub repo for the full PyTorch code.
Compute Infrastructure
- Finetuning the EfficientB0 model with NVIDIA GTX 4060 Laptop GPU, 8GB VRAM, 16GB RAM, Windows 11, Python 3.10
Hardware
- GPU: GTX 4060 (or equivalent, optional CPU)
- RAM: 16GB
Software
- Python 3.10, PyTorch, OpenCV, NumPy, mediapipe, pyttsx3
Citation
BibTeX:
@misc{gokulseetharaman2025wastecnn,
title={Virtual-Drawing-Board-Opencv-pytorch},
author={Gokul Seetharaman},
year={2025},
url={https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch}
}
APA: Gokul Seetharaman. (2025). Virtual-Drawing-board-Opencv-Pytorch. https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch
Model Card Contact
Model tree for itsgokul02/Virtual_Board
Base model
google/efficientnet-b0