Gabriele Sarti's picture

Gabriele Sarti

gsarti

AI & ML interests

Interpretability for generative language models

Recent Activity

Organizations

AI Student Society's profile picture GroNLP's profile picture BigScience Workshop's profile picture Italian CLIP Team's profile picture Flax Community's profile picture Inseq's profile picture Responsibility Framing Project's profile picture How to teach Hugging Face?'s profile picture Risorse per la Lingua Italiana's profile picture Context MT's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture KnowGen Labs's profile picture Grote Testing's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

Posts 42

view post
Post
1652
@victor unprompted feature request: I'd love to have a toggle for a HF collection to control whether new items are added to the top or to the bottom. At the moment everything gets added at the bottom, but it would be great to have newer elements on top to make fresh content easily accessible without having to scroll all the way!
view post
Post
2921
🔍 Today's (self-serving) pick in Interpretability & Analysis of LMs:

A Primer on the Inner Workings of Transformer-based Language Models
by @javifer @gsarti @arianna-bis and M. R. Costa-jussà
( @mt-upc , @GroNLP , @facebook )

This primer can serve as a comprehensive introduction to recent advances in interpretability for Transformer-based LMs for a technical audience, employing a unified notation to introduce network modules and present state-of-the-art interpretability methods.

Interpretability methods are presented with detailed formulations and categorized as either localizing the inputs or model components responsible for a particular prediction or decoding information stored in learned representations. Then, various insights on the role of specific model components are summarized alongside recent work using model internals to direct editing and mitigate hallucinations.

Finally, the paper provides a detailed picture of the open-source interpretability tools landscape, supporting the need for open-access models to advance interpretability research.

📄 Paper: A Primer on the Inner Workings of Transformer-based Language Models (2405.00208)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9