Spaces:
Runtime error
Runtime error
| # Transliteration Models for Indian languages | |
| These are models for transliteration involving Indian languages. | |
| The models are essentially Statistical Machine Translation systems trained using Moses over a | |
| character-level parallel corpora of transliterations. Hence, you will need Moses to use these transliteration models. | |
| The transliteration corpus has itself been mined in an unsupervised fashion from a translation corpus. | |
| Currently we have trained transliteration models for five language pairs: bn-hi, ta-hi, te-hi, en-hi and mr-hi | |
| Support for transliteration has been introduced in Moses from version 2.1 | |
| So please ensure that you have minimum 2.1 version setup for Moses | |
| Commands to run the transliteration module using moses | |
| $moseshome/mosesdecoder/scripts/Transliteration/post-decoding-transliteration.pl \ | |
| --moses-src-dir $moseshome/mosesdecoder --external-bin-dir $moseshome/tools \ | |
| --transliteration-model-dir {path to transliteration model folder} --oov-file {path to file containing oov words, oovs are space separated with each line containing all oovs for the input line}\ | |
| --input-file {input file to transliterated} --output-file {output file location} \ | |
| --input-extension {input language code for eg. en} --output-extension {output language code for eg. hi} --language-model {path to language model} \ | |
| --decoder $moseshome/mosesdecoder/bin/moses | |
| A sample execution of the model will be as follows: | |
| export moseshome={path to moses installation} | |
| $moseshome/mosesdecoder/scripts/Transliteration/post-decoding-transliteration.pl \ | |
| --moses-src-dir $moseshome/mosesdecoder --external-bin-dir $moseshome/tools \ | |
| --transliteration-model-dir /home/ratish/project/nlp_resources/indic_nlp_resources/transliterate/en-hi \ | |
| --oov-file /home/ratish/project/translit/input.oov \ | |
| --input-file /home/ratish/project/translit/input.en \ | |
| --output-file /home/ratish/project/translit/output.hi \ | |
| --input-extension en --output-extension hi --language-model /home/ratish/project/translit/lm/nc.binlm.1 \ | |
| --decoder $moseshome/mosesdecoder/bin/moses | |
| So far, we have seen the use of transliteration in a post-editing task for machine translation task. | |
| In case, the models are needed for purely transliteration purpose, the input file and OOV file are the same. | |
| Sample input file: | |
| New Delhi is capital of India | |
| India is worlds seventh largest nation in the World | |
| OOV file | |
| New Delhi is capital of India | |
| India is worlds seventh largest nation in the World | |
| On running the transliteration module, the output is: | |
| न्यू डेल्ही इस कैपिटल आफ इंडिया | |
| इंडिया इस वर्ल्ड सेवंथ लारगेस्ट नेशन इन थे वर्ल्ड | |