1 2 1

Karen Hambardzumyan

mahnerak

https://mahnerak.com/

AI & ML interests

PhD student @ UCL NLP and FAIR (Meta)

Recent Activity

upvoted a paper about 2 months ago

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

reacted to gsarti's post with 🚀 about 1 year ago

🔍 Today's (self-serving) pick in Interpretability & Analysis of LMs: A Primer on the Inner Workings of Transformer-based Language Models by @javifer @gsarti @arianna-bis and M. R. Costa-jussà (@mt-upc, @GroNLP, @facebook) This primer can serve as a comprehensive introduction to recent advances in interpretability for Transformer-based LMs for a technical audience, employing a unified notation to introduce network modules and present state-of-the-art interpretability methods. Interpretability methods are presented with detailed formulations and categorized as either localizing the inputs or model components responsible for a particular prediction or decoding information stored in learned representations. Then, various insights on the role of specific model components are summarized alongside recent work using model internals to direct editing and mitigate hallucinations. Finally, the paper provides a detailed picture of the open-source interpretability tools landscape, supporting the need for open-access models to advance interpretability research. 📄 Paper: https://huggingface.co/papers/2405.00208 🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

updated a Space about 1 year ago

facebook/llm-transparency-tool-demo

View all activity

Organizations

upvoted a paper about 2 months ago

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Paper • 2511.15593 • Published Nov 19, 2025 • 57

reacted to gsarti's post with 🚀 about 1 year ago

Post

3936

🔍 Today's (self-serving) pick in Interpretability & Analysis of LMs:

A Primer on the Inner Workings of Transformer-based Language Models
by @javifer @gsarti @arianna-bis and M. R. Costa-jussà
( @mt-upc , @GroNLP , @facebook )

This primer can serve as a comprehensive introduction to recent advances in interpretability for Transformer-based LMs for a technical audience, employing a unified notation to introduce network modules and present state-of-the-art interpretability methods.

Interpretability methods are presented with detailed formulations and categorized as either localizing the inputs or model components responsible for a particular prediction or decoding information stored in learned representations. Then, various insights on the role of specific model components are summarized alongside recent work using model internals to direct editing and mitigate hallucinations.

Finally, the paper provides a detailed picture of the open-source interpretability tools landscape, supporting the need for open-access models to advance interpretability research.

📄 Paper: A Primer on the Inner Workings of Transformer-based Language Models (2405.00208)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9