demerzel-iv commited on
Commit
a728903
·
verified ·
1 Parent(s): cab8a9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -1,3 +1,15 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ ---
7
+
8
+ # Model Card for sparsing-law-0.1b-relu
9
+
10
+ - **Paper [optional]:** [paper](todo)
11
+ - **Repository and demo code:** [github](https://github.com/thunlp/SparsingLaw)
12
+
13
+ This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.
14
+
15
+ The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.