Jordan Taylor

JordanTensor
·

AI & ML interests

Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.

Recent Activity

liked a dataset 2 days ago
open-r1/OpenR1-Math-220k
liked a dataset 3 days ago
cais/wmdp
updated a collection 5 days ago
Sandbagging research sprint 1
View all activity

Organizations

Mechanistic  Anomaly Detection's profile picture