Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
license: apache-2.0
|
4 |
+
language:
|
5 |
+
- eng
|
6 |
+
---
|
7 |
+
|
8 |
+
# bigram-subnetworks-pythia-1b
|
9 |
+
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://tylerachang.github.io/).
|
10 |
+
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
|
11 |
+
This repository contains the bigram subnetwork for [EleutherAI/pythia-1b](https://huggingface.co/EleutherAI/pythia-1b).
|
12 |
+
|
13 |
+
## Format
|
14 |
+
|
15 |
+
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
|
16 |
+
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
|
17 |
+
For details on how these subnetworks were trained, see the paper linked above.
|
18 |
+
|
19 |
+
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
|
20 |
+
```
|
21 |
+
from circuit_loading_utils import load_bigram_subnetwork_dict, load_subnetwork_model
|
22 |
+
mask_dict = load_bigram_subnetwork_dict('EleutherAI/pythia-1b')
|
23 |
+
model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-1b', mask_dict)
|
24 |
+
```
|
25 |
+
|
26 |
+
## Citation
|
27 |
+
<pre>
|
28 |
+
@article{chang-bergen-2025-bigram,
|
29 |
+
title={Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models},
|
30 |
+
author={Chang, Tyler A. and Bergen, Benjamin K.},
|
31 |
+
journal={Preprint},
|
32 |
+
year={2024},
|
33 |
+
}
|
34 |
+
</pre>
|