tylerachang
/

bigram-subnetworks-pythia-410m

Model card Files Files and versions Community

bigram-subnetworks-pythia-410m / README.md

tylerachang's picture

Upload README.md with huggingface_hub

5459ba8 verified about 1 month ago

|

history blame contribute delete

1.56 kB


	---
	license: apache-2.0
	language:
	- eng
	---

	# bigram-subnetworks-pythia-410m
	We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
	These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
	This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).

	## Format

	A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
	For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
	For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).

	For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
	```
	from circuit_loading_utils import load_bigram_subnetwork_dict, load_subnetwork_model
	mask_dict = load_bigram_subnetwork_dict('EleutherAI/pythia-410m')
	model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_dict)
	```

	## Citation
	<pre>
	@article{chang-bergen-2025-bigram,
	title={Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models},
	author={Chang, Tyler A. and Bergen, Benjamin K.},
	journal={Preprint},
	year={2025},
	url={https://arxiv.org/abs/2504.15471},
	}
	</pre>