|
--- |
|
language: |
|
- zh |
|
- bo |
|
- kk |
|
- ko |
|
- mn |
|
- ug |
|
- yue |
|
license: "apache-2.0" |
|
--- |
|
|
|
## CINO: Pre-trained Language Models for Chinese Minority Languages(中国少数民族预训练模型) |
|
|
|
Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding. |
|
We have seen rapid progress on building multilingual PLMs in recent year. |
|
However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems. |
|
|
|
To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as |
|
- Chinese,中文(zh) |
|
- Tibetan,藏语(bo) |
|
- Mongolian (Uighur form),蒙语(mn) |
|
- Uyghur,维吾尔语(ug) |
|
- Kazakh (Arabic form),哈萨克语(kk) |
|
- Korean,朝鲜语(ko) |
|
- Zhuang,壮语 |
|
- Cantonese,粤语(yue) |
|
|
|
Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM |
|
|
|
You may also interested in, |
|
|
|
Chinese MacBERT: https://github.com/ymcui/MacBERT |
|
Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm |
|
Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA |
|
Chinese XLNet: https://github.com/ymcui/Chinese-XLNet |
|
Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer |
|
|
|
More resources by HFL: https://github.com/ymcui/HFL-Anthology |
|
|
|
|