|
|
|
--- |
|
license: apache-2.0 |
|
language: |
|
- eng |
|
--- |
|
|
|
# bigram-subnetworks-pythia-410m |
|
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471). |
|
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models. |
|
This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m). |
|
|
|
## Format |
|
|
|
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop). |
|
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks. |
|
For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471). |
|
|
|
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python: |
|
``` |
|
from circuit_loading_utils import load_bigram_subnetwork_dict, load_subnetwork_model |
|
mask_dict = load_bigram_subnetwork_dict('EleutherAI/pythia-410m') |
|
model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_dict) |
|
``` |
|
|
|
## Citation |
|
<pre> |
|
@article{chang-bergen-2025-bigram, |
|
title={Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models}, |
|
author={Chang, Tyler A. and Bergen, Benjamin K.}, |
|
journal={Preprint}, |
|
year={2024}, |
|
url={https://arxiv.org/abs/2504.15471}, |
|
} |
|
</pre> |
|
|