File size: 496 Bytes
582ee1e |
1 2 3 4 5 6 7 |
---
language: en
---
This is a Hugging Face transformers-style conversion of the original __SMoE 15B-parameter__ model with __BFLOAT16__ from the paper "[Efficient Large Scale Language Modeling with Mixtures of Experts](https://arxiv.org/abs/2112.10684)" from Artetxe et al. The original model card can be found at https://github.com/facebookresearch/fairseq/blob/main/examples/moe_lm/model_card.md.
The usage example and modeling code can be found at https://github.com/pingzhili/light-fairseq
|