Punjabi Gurmukhi to Shahmukhi Transliteration System

Our supervised Punjabi transliteration systems built using unsupervised corpus are bidirectional NMT systems which effectively convert text between Gurmukhi and Shahmukhi scripts. The Gurmukhi-to-Shahmukhi model achieves a 98.1 BLEU score and 99.5% word-level accuracy, while the Shahmukhi-to-Gurmukhi model scores 87.7 BLEU.

Corpus Details

  • Total Sentences: 6.3 million
  • Domains Covered: Various domains including CCaligned, ccmatrix, TED, QED, OPUS, TIco, Wikimedia, Multicclaigned, Emille, IJCNLP, xlent, and paracrawl.
  • Test Corpus: FLORES-101

Model Details

- **BLEU Score:** 87.7

You may also explore our Gurmukhi-to-Shahmukhi Model with BLEU Score: of 98.1 here.

Usage

These resources are intended to facilitate research and development in the field of Punjabi transliteration. They can be used to train new models or improve existing ones, enabling high-quality transliteration between Gurmukhi and Shahmukhi scripts.

Citation

If you use our model, kindly cite our paper:

@article{Shehzadi2024,
  title={Unsupervised Punjabi Corpus and Neural Machine Transliteration
 System},
  author={Shehzadi Ambreen, Sadaf Abdul Rauf, MG Abbas Malik and Muhammad Imran },      journal={Heliyon},
  year={2024},
  note={Under review}
 }
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) does not yet support fairseq models for this pipeline type.

Dataset used to train SLPG/Punjabi_Shahmukhi_to_Gurmukhi_Transliteration

Collection including SLPG/Punjabi_Shahmukhi_to_Gurmukhi_Transliteration