schaeff
/

gpt2-xl_vanilla800

Text Generation

text-generation-inference

Model card Files Files and versions Community

schaeff commited on 12 days ago

Commit

a258416

·

verified ·

1 Parent(s): 40cdc23

Update README.md

Files changed (1) hide show

README.md +12 -4

README.md CHANGED Viewed

@@ -59,8 +59,16 @@ This model is part of a collection of LayerNorm-free models. The table below pro
 ## Citation
-Title: *Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability*
-**BibTeX:**
-[TBD]

 ## Citation
+If you have found our work useful please cite as:
+```
+@misc{gpt2layernorm2025,
+  author = {Baroni, Luca and Khara, Galvin and Schaeffer, Joachim and Subkhankulov, Marat and Heimersheim, Stefan},
+  title = {Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability},
+  year = {2025},
+  eprint = {2507.02559},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.LG},
+  url = {https://arxiv.org/abs/2507.02559v1}
+}
+```