QE4PE: Word-level Quality Estimation for Human Post-Editing Paper • 2503.03044 • Published 5 days ago • 5
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models Paper • 2502.15886 • Published 16 days ago • 1
ReAct: Synergizing Reasoning and Acting in Language Models Paper • 2210.03629 • Published Oct 6, 2022 • 22
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution Paper • 2501.18887 • Published Jan 31 • 1
Sparse Autoencoders Trained on the Same Data Learn Different Features Paper • 2501.16615 • Published Jan 28 • 1
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders Paper • 2501.17148 • Published Jan 28 • 1
Gemma Neogenesis 💎🌍🇮🇹 Collection Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy • 11 items • Updated Jan 19 • 5
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published Jan 14 • 10
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization Paper • 2412.04619 • Published Dec 5, 2024 • 1
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models Paper • 2412.16247 • Published Dec 20, 2024 • 1