MIB Datasets Collection The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated 5 days ago • 1
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 36
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Paper • 2501.06346 • Published Jan 10 • 1
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 12 days ago • 72
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 62
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models Paper • 2502.12892 • Published Feb 18 • 1
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 76
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models Paper • 2502.15886 • Published Feb 21 • 1
ReAct: Synergizing Reasoning and Acting in Language Models Paper • 2210.03629 • Published Oct 6, 2022 • 25
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution Paper • 2501.18887 • Published Jan 31 • 1