Post
π Today's pick in Interpretability & Analysis of LMs: ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models by
@casszhao
and B. Shan
Authors propose Recursive Attribution Generation (ReAGent), a perturbation-based feature attribution approach specifically conceived for generative LMs. The method employs a lightweight encoder LM to replace sampled input spans with valid alternatives and measure the effect of the perturbation on the drop in next token probability predictions. ReAGent is shown to consistentlyoutperform other established approaches across several models and generation tasks in terms of token- and sentence-level faithfulness.
π Paper: ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models (2402.00794)
π» Code: https://github.com/casszhao/ReAGent
Authors propose Recursive Attribution Generation (ReAGent), a perturbation-based feature attribution approach specifically conceived for generative LMs. The method employs a lightweight encoder LM to replace sampled input spans with valid alternatives and measure the effect of the perturbation on the drop in next token probability predictions. ReAGent is shown to consistentlyoutperform other established approaches across several models and generation tasks in terms of token- and sentence-level faithfulness.
π Paper: ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models (2402.00794)
π» Code: https://github.com/casszhao/ReAGent