XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining
Overview
XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.
Key Innovations
- Shared Weight Framework: Strategic weight reuse between encoder and decoder layers
- Hybrid Decoder Architecture: Combines:
- Standard transformer decoder layers
- Custom decoder layers with dual FFN structure
- Optimized layer insertion pattern (1 normal layer per 3 custom layers)
- Efficient Adaptation: Enables effective text generation with minimal training data
Model Architecture
Component | Description |
---|---|
Encoder | XLM-RoBERTa base (CINO v2 variant) |
Decoder | Hybrid transformer with: |
• NormalDecoderLayer: Randomly initialized standard layers | |
• CustomDecoderLayer: Weight-shared layers with dual FFN structure | |
Parameters | 492M total parameters |
Advanced Features
- Beam search decoding
- Mixed-precision training
- Cross-lingual transfer learning
For detailed usage instructions, see our GitHub repository
Supported Languages
Primary focus on Chinese minority languages:
- Tibetan (bo)
- Uyghur (ug)
- Kazakh (kk)
- Mongolian (mn)
- Chinese (zh)
Citation
@article{su2025multilingualencoderknowsrealize,
author = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong},
title = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining
for Extremely Low-Resource Languages},
journal = {CoRR},
volume = {abs/2502.10852},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2502.10852},
doi = {10.48550/ARXIV.2502.10852},
eprinttype = {arXiv},
eprint = {2502.10852}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for KEVVVV/xlm-swcm
Base model
hfl/cino-base-v2