We trained 12 Sparse Autoencoders on the Residual Stream of GPT2-small. Each of these contains ~ 25k features as we used an expansion factor of 32 and the residual stream dimension of GPT2 has 768 dimensions. We trained with an L1 coefficient of 8e-5 and learning rate of 4e-4 for 300 Million tokens, storing a buffer of ~500k tokens from OpenWebText which is refilled and shuffled whenever 50% of the tokens are used. To avoid dead neurons, we use ghost gradients. Our encoder/decoder weights are untied but we do use a tied decoder bias initialized at the geometric median per Bricken et al.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train jbloom/GPT2-Small-SAEs