matybohacek/RA-SAE-DINOv2-32k

RA-SAE-DINOv2-32k is an archetypal SAE, which decomposes DINOv2 featues into interpretable concept activations, allowing you to extract human-understandable visual concepts from any image as sparse feature vectors. With 32,000 concepts, it is the largest such model to date.

Getting Started

from transformers import AutoModel
from PIL import Image

model = AutoModel.from_pretrained(
    "matybohacek/RA-SAE-DINOv2-32k", 
    trust_remote_code=True
)

image = Image.open("image.jpg")
feats = model.encode_images(image)

Training Details

Data. The auto‑encoder is trained on the complete ImageNet‑1k training split, (approx 1.28M RGB images). Each image is converted to 261 visual tokens using DINOv2; tokens are fed to the SAE without class or position embeddings. The total number of training tokens is therefore approx 1.67E10 (50 x 1.28 M x 261).

Dictionary. The dictionary has 32,000 concept dimensions. For the sparse activation rule, top‑k masking with k=5, is used; activations outside the largest five per input are set to 0. The weights are initialized using Xavier/Glorot. The training is conducted at mixed precision (fp16), with the last ten epochs performed at full precision.

Optimizer and Schedule. The model is trained for 50 epochs using base AdamW (beta_1=0.9, beta_2=0.999) optimizer is employed with weight decay set to 10E-5. Linear warm‑up is applied on the first 5% of steps, followed by cosine decay from eta_max=5E-4 to eta_final=E-6. MSE loss is used alongside an auxiliary term penalizing activations that never enter the top‑k set, where lambda=E-5.

Compute Resources. The model is trained for approx 24 hours on three NVIDIA H100 GPUs.

Citation

Conceptual blindspot paper:

@article{bohacek2025uncovering,
  title={Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders},
  author={Bohacek, Matyas and Fel, Thomas and Agrawala, Maneesh and Lubana, Ekdeep Singh},
  journal={arXiv preprint arXiv:2506.19708},
  year={2025}
}

Original Archetypal SAE paper:

@article{fel2025archetypal,
  title={Archetypal SAE: Adaptive and stable dictionary learning for concept extraction in large vision models},
  author={Fel, Thomas and Lubana, Ekdeep Singh and Prince, Jacob S and Kowal, Matthew and Boutin, Victor and Papadimitriou, Isabel and Wang, Binxu and Wattenberg, Martin and Ba, Demba and Konkle, Talia},
  journal={arXiv preprint arXiv:2502.12892},
  year={2025}
}

matybohacek
/

RA-SAE-DINOv2-32k

Getting Started

Training Details

Citation

Model tree for matybohacek/RA-SAE-DINOv2-32k