MonOCR: Production-Ready Mon Language OCR
MonOCR is an Optical Character Recognition (OCR) model engineered for the Mon language (mnw). Optimized for performance and accuracy, it accurately recognizes Mon characters from documents, digital texts, and scene images.
This repository serves as the official distribution point for MonOCR model weights in deployment-ready formats.
Software Development Kits
Unified SDKs are available for seamless integration into existing applications. These SDKs handle model caching, image preprocessing, and inference out-of-the-box.
| SDK | Platform | Registry |
|---|---|---|
| monocr-onnx | Python | PyPI |
| monocr | Node.js | npm |
| monocr-go | Go | GitHub |
Model Checkpoints
| Format | Path | Intended Use Case |
|---|---|---|
| ONNX | onnx/monocr.onnx |
Standard deployments (Server/Desktop). |
| TFLite (int8) | tflite/monocr.tflite |
Extreme edge/mobile (Low latency, minimized size). |
| TFLite (fp16) | tflite/float16.tflite |
High-efficiency mobile GPU acceleration. |
| TFLite (fp32) | tflite/float32.tflite |
High-precision mobile inference. |
| PyTorch | pytorch/monocr.ckpt |
Training, fine-tuning, and research. |
Performance Metrics
| Metric | Value |
|---|---|
| Train Loss | 1.22 |
| Validation Loss | 1.157 |
| CER | 0.025 |
| WER | 0.211 |
| Epochs | 27 |
| Best Checkpoint | monocr-epoch=27-val_loss=1.157-val_cer=0.025.ckpt |
Dataset Summary
- Total samples: 3,030,000
- Train size: 3,000,000
- Validation size: 30,000
- Data source description: Procedural synthetic text generation across multiple Mon fonts combined with real-world digit corpuses.
- Augmentation strategy: Applied during training: image-level augmentations including noise, blur, and transformations.
Model Specifications
- Architecture type: MobileNetV3-Large Backbone + 2-layer BiLSTM + Linear CTC Head
- Parameter count: 6.58M parameters
- Model size: 100.73 MB (PyTorch Checkpoint)
- Training hardware: NVIDIA GPU (Single GPU run)
- Training time: ~2-4 days
Reproducibility
- Optimizer: AdamW
- Learning rate: 0.0001 (Warmup + Cosine Annealing)
- Batch size: 48 (with Gradient Accumulation = 4)
- Loss function: CTCLoss (with label smoothing $\epsilon=0.05$)
Technical Specification
- Input Tensors: Grayscale (1-channel), 128px Height, Variable Width.
- Image Preprocessing: Aspect-ratio preserving resize to 128px height, followed by
[0, 1]pixel normalization. - Decoding Strategy: Connectionist Temporal Classification (CTC) Beam Search Decoding (width=10).
- Vocabulary: 315 characters (Mon, Burmese, digits, punctuation, and symbols). Encoding is standard UTF-8 (see
charset.txt).
Integration Guidelines
For developers building custom drivers:
- Refer to
charset.txtfor the index-to-character mapping (Index 0 is reserved for<blank>). - Ensure input images are high-contrast and properly scaled to 128px height.
- ONNX models use dynamic axes for width to support varying word lengths without padding.
License
All model weights and metadata are provided under the MIT License.
Citation
@software{monocr2026,
author = {Janakh},
title = {MonOCR: Production-Ready OCR for Mon Language},
year = {2026},
url = {https://huggingface.co/janakhpon/monocr}
}
- Downloads last month
- 39