Neel Nanda's picture

3 2 7

Neel Nanda

NeelNanda

·

https://neelnanda.io

AI & ML interests

Mechanistic Interpretability

Recent Activity

authored a paper 18 days ago

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

authored a paper 4 months ago

Open Problems in Mechanistic Interpretability

authored a paper 7 months ago

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

View all activity

Organizations

NeelNanda's activity

upvoted a paper over 1 year ago

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 14

upvoted a paper almost 2 years ago

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 11