File size: 573 Bytes
a728903 a96a9b3 a728903 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
---
license: mit
language:
- en
- zh
---
# Model Card for sparsing-law-0.1b-relu
- **Paper:** [paper](https://arxiv.org/pdf/2411.02335)
- **Repository and demo code:** [github](https://github.com/thunlp/SparsingLaw)
This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.
The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.
|