Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Abdullah's picture
2 3 1

Abdullah

amirabdullah19852020
esbenkran's profile picture junchenzhao's profile picture dhruvnathawani's profile picture
·
  • amirabdullah19852020

AI & ML interests

Mechanistic interpretability, high dimensional geometry, persona role playing.

Recent Activity

updated a collection 26 days ago
Transferring Activation Features for model interventions
published a dataset 26 days ago
withmartian/binary_bbq
updated a model about 1 month ago
withmartian/trained_mediqa_model
View all activity

Organizations

Thoughtworks's profile picture Apart Research's profile picture Martian's profile picture nlp-and-interpretability's profile picture Backdoors research's profile picture

upvoted a collection 4 months ago

Transferring Activation Features for model interventions

Collection
22 items • Updated 26 days ago • 1
upvoted a collection 7 months ago

Blog: Activations transfer for model interventions.

Collection
Collects backdoor datasets, language models and transfer mappings between these spaces. • 6 items • Updated May 10 • 3
upvoted a paper over 1 year ago

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

Paper • 2310.08164 • Published Oct 12, 2023 • 4
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs