tylerachang
/

bigram-subnetworks-pythia-410m

tylerachang commited on 10 days ago

Commit

2a1c002

verified ·

1 Parent(s): 74784c6

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ language:
 ---
 # bigram-subnetworks-pythia-410m
-We release bigram subnetworks as described in [Chang and Bergen (2025)](https://tylerachang.github.io/).
 These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
 This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
@@ -14,7 +14,7 @@ This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](http
 A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
 For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
-For details on how these subnetworks were trained, see the paper linked above.
 For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
 ```
@@ -30,5 +30,6 @@ model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_
   author={Chang, Tyler A. and Bergen, Benjamin K.},
   journal={Preprint},
   year={2024},
 }
 </pre>

 ---
 # bigram-subnetworks-pythia-410m
+We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
 These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
 This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
 A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
 For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
+For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
 For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
 ```
   author={Chang, Tyler A. and Bergen, Benjamin K.},
   journal={Preprint},
   year={2024},
+  url={https://arxiv.org/abs/2504.15471},
 }
 </pre>