Raincleared commited on
Commit
e8d6051
·
verified ·
1 Parent(s): 000a74c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -4
README.md CHANGED
@@ -1,15 +1,33 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
5
  - zh
 
6
  ---
7
 
8
  # Model Card for sparsing-law-0.1b-relu
9
 
10
  - **Paper:** [paper](https://arxiv.org/pdf/2411.02335)
11
- - **Repository and demo code:** [github](https://github.com/thunlp/SparsingLaw)
12
 
13
- This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.
14
 
15
- The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  language:
4
  - en
5
  - zh
6
+ pipeline_tag: text-generation
7
  ---
8
 
9
  # Model Card for sparsing-law-0.1b-relu
10
 
11
  - **Paper:** [paper](https://arxiv.org/pdf/2411.02335)
12
+ - **Repository containing relevant codes:** [github](https://github.com/thunlp/SparsingLaw)
13
 
14
+ ### Introduction
15
 
16
+ The model is one of the key checkpoints used for most analyses in the paper *Sparsing Law: Towards Large Language Models with Greater Activation Sparsity*.
17
+ It is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.
18
+
19
+ The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler.
20
+ Note that it is a base model derived from the last checkpoint of the stable pre-training stage, which has not undergone the decay or SFT stage.
21
+
22
+ ### Citation
23
+
24
+ Please kindly cite using the following BibTeX:
25
+
26
+ ```bibtex
27
+ @article{luo2024sparsinglaw,
28
+ title={{Sparsing Law}: Towards Large Language Models with Greater Activation Sparsity},
29
+ author={Yuqi Luo and Chenyang Song and Xu Han and Yingfa Chen and Chaojun Xiao and Zhiyuan Liu and Maosong Sun},
30
+ year={2024},
31
+ journal={arXiv preprint arXiv:2411.02335},
32
+ url={https://arxiv.org/pdf/2411.02335.pdf}
33
+ }