Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ language:
|
|
6 |
---
|
7 |
|
8 |
# bigram-subnetworks-pythia-410m
|
9 |
-
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://
|
10 |
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
|
11 |
This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
|
12 |
|
@@ -14,7 +14,7 @@ This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](http
|
|
14 |
|
15 |
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
|
16 |
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
|
17 |
-
For details on how these subnetworks were trained, see
|
18 |
|
19 |
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
|
20 |
```
|
@@ -30,5 +30,6 @@ model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_
|
|
30 |
author={Chang, Tyler A. and Bergen, Benjamin K.},
|
31 |
journal={Preprint},
|
32 |
year={2024},
|
|
|
33 |
}
|
34 |
</pre>
|
|
|
6 |
---
|
7 |
|
8 |
# bigram-subnetworks-pythia-410m
|
9 |
+
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
|
10 |
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
|
11 |
This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
|
12 |
|
|
|
14 |
|
15 |
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
|
16 |
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
|
17 |
+
For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
|
18 |
|
19 |
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
|
20 |
```
|
|
|
30 |
author={Chang, Tyler A. and Bergen, Benjamin K.},
|
31 |
journal={Preprint},
|
32 |
year={2024},
|
33 |
+
url={https://arxiv.org/abs/2504.15471},
|
34 |
}
|
35 |
</pre>
|