MonOCR: Production-Ready Mon Language OCR

MonOCR is an Optical Character Recognition (OCR) model engineered for the Mon language (mnw). Optimized for performance and accuracy, it accurately recognizes Mon characters from documents, digital texts, and scene images.

This repository serves as the official distribution point for MonOCR model weights in deployment-ready formats.

Software Development Kits

Unified SDKs are available for seamless integration into existing applications. These SDKs handle model caching, image preprocessing, and inference out-of-the-box.

SDK Platform Registry
monocr-onnx Python PyPI
monocr Node.js npm
monocr-go Go GitHub

Model Checkpoints

Format Path Intended Use Case
ONNX onnx/monocr.onnx Standard deployments (Server/Desktop).
TFLite (int8) tflite/monocr.tflite Extreme edge/mobile (Low latency, minimized size).
TFLite (fp16) tflite/float16.tflite High-efficiency mobile GPU acceleration.
TFLite (fp32) tflite/float32.tflite High-precision mobile inference.
PyTorch pytorch/monocr.ckpt Training, fine-tuning, and research.

Performance Metrics

Metric Value
Train Loss 1.22
Validation Loss 1.157
CER 0.025
WER 0.211
Epochs 27
Best Checkpoint monocr-epoch=27-val_loss=1.157-val_cer=0.025.ckpt

Dataset Summary

  • Total samples: 3,030,000
  • Train size: 3,000,000
  • Validation size: 30,000
  • Data source description: Procedural synthetic text generation across multiple Mon fonts combined with real-world digit corpuses.
  • Augmentation strategy: Applied during training: image-level augmentations including noise, blur, and transformations.

Model Specifications

  • Architecture type: MobileNetV3-Large Backbone + 2-layer BiLSTM + Linear CTC Head
  • Parameter count: 6.58M parameters
  • Model size: 100.73 MB (PyTorch Checkpoint)
  • Training hardware: NVIDIA GPU (Single GPU run)
  • Training time: ~2-4 days

Reproducibility

  • Optimizer: AdamW
  • Learning rate: 0.0001 (Warmup + Cosine Annealing)
  • Batch size: 48 (with Gradient Accumulation = 4)
  • Loss function: CTCLoss (with label smoothing $\epsilon=0.05$)

Technical Specification

  • Input Tensors: Grayscale (1-channel), 128px Height, Variable Width.
  • Image Preprocessing: Aspect-ratio preserving resize to 128px height, followed by [0, 1] pixel normalization.
  • Decoding Strategy: Connectionist Temporal Classification (CTC) Beam Search Decoding (width=10).
  • Vocabulary: 315 characters (Mon, Burmese, digits, punctuation, and symbols). Encoding is standard UTF-8 (see charset.txt).

Integration Guidelines

For developers building custom drivers:

  1. Refer to charset.txt for the index-to-character mapping (Index 0 is reserved for <blank>).
  2. Ensure input images are high-contrast and properly scaled to 128px height.
  3. ONNX models use dynamic axes for width to support varying word lengths without padding.

License

All model weights and metadata are provided under the MIT License.

Citation

@software{monocr2026,
  author = {Janakh},
  title = {MonOCR: Production-Ready OCR for Mon Language},
  year = {2026},
  url = {https://huggingface.co/janakhpon/monocr}
}
Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support