tylerachang commited on
Commit
ee41392
·
verified ·
1 Parent(s): 4d04a5a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -6,7 +6,7 @@ language:
6
  ---
7
 
8
  # bigram-subnetworks-pythia-160m
9
- We release bigram subnetworks as described in [Chang and Bergen (2025)](https://tylerachang.github.io/).
10
  These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
11
  This repository contains the bigram subnetwork for [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m).
12
 
@@ -14,7 +14,7 @@ This repository contains the bigram subnetwork for [EleutherAI/pythia-160m](http
14
 
15
  A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
16
  For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
17
- For details on how these subnetworks were trained, see the paper linked above.
18
 
19
  For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
20
  ```
@@ -30,5 +30,6 @@ model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-160m', mask_
30
  author={Chang, Tyler A. and Bergen, Benjamin K.},
31
  journal={Preprint},
32
  year={2024},
 
33
  }
34
  </pre>
 
6
  ---
7
 
8
  # bigram-subnetworks-pythia-160m
9
+ We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
10
  These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
11
  This repository contains the bigram subnetwork for [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m).
12
 
 
14
 
15
  A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
16
  For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
17
+ For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
18
 
19
  For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
20
  ```
 
30
  author={Chang, Tyler A. and Bergen, Benjamin K.},
31
  journal={Preprint},
32
  year={2024},
33
+ url={https://arxiv.org/abs/2504.15471},
34
  }
35
  </pre>