Model Card for Bangla Handwritten Character Recognition

This is a Bangla handwritten character recognition model that can detect 195 classes, including compound characters. The model is designed for accurate offline recognition of Bangla handwritten script.

Model Detail

Architecture: EfficientNetV2S (feature extractor)

Custom Layers: Triple-head architecture with two attention mechanisms, followed by a soft average ensemble layer.

Parameters: Total: 30,912,695

Framework: TensorFlow / Keras

Finetuned from: EfficientNetV2S pretrained on ImageNet

Developed by: Team Segfault

License: MIT

Model Description

This model is a deep learning-based handwritten character classifier designed specifically for the Bangla language. It recognizes 195 different Bangla characters, including basic vowels and consonants as well as compound letters.

Built using the EfficientNetV2S architecture as a feature extractor, the model applies a custom triple-head fully connected (FC) architecture with two attention mechanisms to enhance discriminative power. The outputs from all three heads are averaged using a soft-voting strategy to produce the final prediction, improving generalization and reducing overfitting.

The goal of this model is to support accurate offline recognition of handwritten Bangla text — useful for educational tools, digitization of documents, and OCR applications focused on Bangla script. It achieves high accuracy on a balanced dataset, thanks to a carefully designed data augmentation pipeline that includes ElasticTransform and random rotation.

This model was trained on the MatrivashaBangla dataset(a newly combinded dataset), containing over 500,000 samples. The entire training and evaluation pipeline is implemented using TensorFlow/Keras.

Developed by: [Team Segfault]
Model type: [Deep Neural Network]
License: [MIT]
Finetuned from model : [EfficientNetV2S]

Uses

Offline OCR systems for Bangla script
Educational software or Bangla digitization tools
Document recognition systems, etc

Out-of-Scope Use

Not intended for real-time inference on low-power devices
Not designed for other scripts or languages

Training Details

Training Data

Dataset Name: MatrivashaBangla (a newly combinded dataset using BanglaLekha-Isolated and Matrivasha_raw)

Size: 195 classes, ~500,000 images

Augmentations: ElasticTransform, Rotate

Balance: Class-balanced via custom augmentation scripts (2500-2580 images per class)

Evaluation

Train Accuracy: ~97.59%

Validation Accuracy: ~96.69%

Metrics: Accuracy, F1-score, Precision, Recall

Evaluation Dataset: Test split from MatrivashaBangla (10%), Accuracy ~96.60%

Model Architecture

EfficientNetV2S (Frozen Layers) ↓ Feature Output (Shared) ↓ GlobalAvaragePooling ↓ ├── FC Head 1 with Attention ├── FC Head 2 with Attention └── FC Head 3 (Baseline) ↓ Soft Average of 3 Head Outputs ↓ Final Prediction (195 classes)

Limitations

May misclassify extremely poor handwriting or characters written with noisy backgrounds
Not tested for real-time edge deployment
Only trained on standard handwritten script — no cursive, artistic, or stylized forms

Environmental Impact

Compute Region: [Local training, not on cloud]
Carbon Emitted: [ ~5–10 kg CO2]

Model Card Contact

Author: Meharaz Hossain

Email: [email protected]

GitHub: https://github.com/meharaz733

Hugging Face: https://huggingface.co/meharaz733

meharaz733
/

ChoritroAI

You need to agree to share your contact information to access this model