Adrià Garriga-Alonso's picture

Adrià Garriga-Alonso

agaralon

·

https://agarri.ga/

AI & ML interests

AI safety, interpretability

Recent Activity

authored a paper 8 days ago

Open Problems in Mechanistic Interpretability

updated a dataset 2 months ago

agaralon/ACDC-Runs

updated a dataset 2 months ago

agaralon/ACDC-Runs

View all activity

Organizations

Papers 1

arxiv:2501.16496

models 1

agaralon/acdc_reset_models

Updated May 14, 2023

datasets 1

agaralon/ACDC-Runs

Updated Dec 5, 2024 • 41