File size: 556 Bytes
a728903
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
license: mit
language:
- en
- zh
---

# Model Card for sparsing-law-0.1b-relu

- **Paper [optional]:** [paper](todo)
- **Repository and demo code:** [github](https://github.com/thunlp/SparsingLaw)

This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.

The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.