XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining

Overview

XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.

Key Innovations

  • Shared Weight Framework: Strategic weight reuse between encoder and decoder layers
  • Hybrid Decoder Architecture: Combines:
    • Standard transformer decoder layers
    • Custom decoder layers with dual FFN structure
    • Optimized layer insertion pattern (1 normal layer per 3 custom layers)
  • Efficient Adaptation: Enables effective text generation with minimal training data

Model Architecture

Component Description
Encoder XLM-RoBERTa base (CINO v2 variant)
Decoder Hybrid transformer with:
• NormalDecoderLayer: Randomly initialized standard layers
• CustomDecoderLayer: Weight-shared layers with dual FFN structure
Parameters 492M total parameters

Advanced Features

  • Beam search decoding
  • Mixed-precision training
  • Cross-lingual transfer learning

For detailed usage instructions, see our GitHub repository

Supported Languages

Primary focus on Chinese minority languages:

  • Tibetan (bo)
  • Uyghur (ug)
  • Kazakh (kk)
  • Mongolian (mn)
  • Chinese (zh)

Citation

@article{su2025multilingualencoderknowsrealize,
  author       = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong},
  title        = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining
                  for Extremely Low-Resource Languages},
  journal      = {CoRR},
  volume       = {abs/2502.10852},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2502.10852},
  doi          = {10.48550/ARXIV.2502.10852},
  eprinttype    = {arXiv},
  eprint       = {2502.10852}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KEVVVV/xlm-swcm

Base model

hfl/cino-base-v2
Finetuned
(1)
this model

Dataset used to train KEVVVV/xlm-swcm