https://github.com/jordansauce/sandbagging-research-sprint/ https://wandb.ai/jordantensor/gemma-sandbagging
Jordan Taylor
JordanTensor
AI & ML interests
Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.
Recent Activity
liked
a dataset
2 days ago
open-r1/OpenR1-Math-220k
liked
a dataset
3 days ago
cais/wmdp
updated
a collection
5 days ago
Sandbagging research sprint 1
Organizations
Collections
1
models
46

JordanTensor/gemma-sandbagging-0w4j7rba-step1536
Updated

JordanTensor/gemma-sandbagging-0w4j7rba-step1024
Updated

JordanTensor/gemma-sandbagging-0w4j7rba-step512
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step1968
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step1952
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step1936
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step800
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step400
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step384
Updated

JordanTensor/gemma-sandbagging-mzpd84pf-step368
Updated