JunxiongWang
/

MambaInLlama_0_50

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

JunxiongWang commited on Sep 2

Commit

e4e53e3

•

1 Parent(s): 6c2c854

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -76,3 +76,14 @@ The following hyperparameters were used during training:
 - Pytorch 2.1.1+cu118
 - Datasets 2.20.0
 - Tokenizers 0.19.1

 - Pytorch 2.1.1+cu118
 - Datasets 2.20.0
 - Tokenizers 0.19.1
+[MambaInLlama](arxiv.org/abs/2408.15237)
+```
+@article{junxiongdaniele2024mambainllama,
+  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
+  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
+  journal = {arXiv preprint arXiv:2408.15237},
+  year    = {2024}
+}
+```