Update README.md
Browse files
README.md
CHANGED
@@ -59,8 +59,16 @@ This model is part of a collection of LayerNorm-free models. The table below pro
|
|
59 |
|
60 |
## Citation
|
61 |
|
62 |
-
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
## Citation
|
61 |
|
62 |
+
If you have found our work useful please cite as:
|
63 |
|
64 |
+
```
|
65 |
+
@misc{gpt2layernorm2025,
|
66 |
+
author = {Baroni, Luca and Khara, Galvin and Schaeffer, Joachim and Subkhankulov, Marat and Heimersheim, Stefan},
|
67 |
+
title = {Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability},
|
68 |
+
year = {2025},
|
69 |
+
eprint = {2507.02559},
|
70 |
+
archivePrefix = {arXiv},
|
71 |
+
primaryClass = {cs.LG},
|
72 |
+
url = {https://arxiv.org/abs/2507.02559v1}
|
73 |
+
}
|
74 |
+
```
|