SAE-Reasoning
Collection
Models and datasets used in the paper "Interpreting Reasoning Features in Large Language Models via Sparse Autoenoder": https://arxiv.org/abs/2503.188
•
4 items
•
Updated
This repository contains the following SAEs:
Model described in the paper I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders. Code available at https://github.com/AIRI-Institute/SAE-Reasoning
Load these SAEs using SAELens as below:
from sae_lens import SAE
sae, cfg_dict, sparsity = SAE.from_pretrained("andreuka18/deepseek-r1-distill-llama-8b-lmsys-openthoughts", "<sae_id>")