|
--- |
|
language: |
|
- zh |
|
- bo |
|
- kk |
|
- ko |
|
- mn |
|
- ug |
|
- yue |
|
license: "apache-2.0" |
|
--- |
|
|
|
## CINO: Pre-trained Language Models for Chinese Minority Languages(中国少数民族预训练模型) |
|
|
|
Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding. |
|
We have seen rapid progress on building multilingual PLMs in recent year. |
|
However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems. |
|
|
|
To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as Tibetan, Mongolian (Uighur form), Uyghur, Kazakh (Arabic form), Korean, Zhuang, Cantonese, etc. |
|
|
|
Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM |
|
|
|
You may also interested in, |
|
|
|
Chinese MacBERT: https://github.com/ymcui/MacBERT |
|
Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm |
|
Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA |
|
Chinese XLNet: https://github.com/ymcui/Chinese-XLNet |
|
Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer |
|
|
|
More resources by HFL: https://github.com/ymcui/HFL-Anthology |
|
|
|
|