MonOCR: Production-Ready Mon Language OCR

MonOCR is an Optical Character Recognition (OCR) model engineered for the Mon language (mnw). Optimized for performance and accuracy, it accurately recognizes Mon characters from documents, digital texts, and scene images.

This repository serves as the official distribution point for MonOCR model weights in deployment-ready formats.

Software Development Kits

Unified SDKs are available for seamless integration into existing applications. These SDKs handle model caching, image preprocessing, and inference out-of-the-box.

SDK	Platform	Registry
monocr-onnx	Python	PyPI
monocr	Node.js	npm
monocr-go	Go	GitHub

Model Checkpoints

Format	Path	Intended Use Case
ONNX	`onnx/monocr.onnx`	Standard deployments (Server/Desktop).
TFLite (int8)	`tflite/monocr.tflite`	Extreme edge/mobile (Low latency, minimized size).
TFLite (fp16)	`tflite/float16.tflite`	High-efficiency mobile GPU acceleration.
TFLite (fp32)	`tflite/float32.tflite`	High-precision mobile inference.
PyTorch	`pytorch/monocr.ckpt`	Training, fine-tuning, and research.

Performance Metrics

Metric	Value
Train Loss	1.22
Validation Loss	1.157
CER	0.025
WER	0.211
Epochs	27
Best Checkpoint	`monocr-epoch=27-val_loss=1.157-val_cer=0.025.ckpt`

Dataset Summary

Total samples: 3,030,000
Train size: 3,000,000
Validation size: 30,000
Data source description: Procedural synthetic text generation across multiple Mon fonts combined with real-world digit corpuses.
Augmentation strategy: Applied during training: image-level augmentations including noise, blur, and transformations.

Model Specifications

Architecture type: MobileNetV3-Large Backbone + 2-layer BiLSTM + Linear CTC Head
Parameter count: 6.58M parameters
Model size: 100.73 MB (PyTorch Checkpoint)
Training hardware: NVIDIA GPU (Single GPU run)
Training time: ~2-4 days

Reproducibility

Optimizer: AdamW
Learning rate: 0.0001 (Warmup + Cosine Annealing)
Batch size: 48 (with Gradient Accumulation = 4)
Loss function: CTCLoss (with label smoothing $\epsilon=0.05$)

Technical Specification

Input Tensors: Grayscale (1-channel), 128px Height, Variable Width.
Image Preprocessing: Aspect-ratio preserving resize to 128px height, followed by [0, 1] pixel normalization.
Decoding Strategy: Connectionist Temporal Classification (CTC) Beam Search Decoding (width=10).
Vocabulary: 315 characters (Mon, Burmese, digits, punctuation, and symbols). Encoding is standard UTF-8 (see charset.txt).

Integration Guidelines

For developers building custom drivers:

Refer to charset.txt for the index-to-character mapping (Index 0 is reserved for <blank>).
Ensure input images are high-contrast and properly scaled to 128px height.
ONNX models use dynamic axes for width to support varying word lengths without padding.

License

All model weights and metadata are provided under the MIT License.

Citation

@software{monocr2026,
  author = {Janakh},
  title = {MonOCR: Production-Ready OCR for Mon Language},
  year = {2026},
  url = {https://huggingface.co/janakhpon/monocr}
}

Downloads last month: 39