@gsarti on Hugging Face: "🔍 Today's pick in Interpretability & Analysis of LMs: Model Editing Can Hurt…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

gsarti

posted an update Jan 25

Post

🔍 Today's pick in Interpretability & Analysis of LMs: Model Editing Can Hurt General Abilities of Large Language Models by J.C. Gu et al.

This work raises concerns that gains in factual knowledge after model editing can result in a significant degradation of the general abilities of LLMs. The authors evaluate 4 popular editing methods on 2 LLMs across eight representative tasks, showing model editing does substantially hurt model general abilities. A suggestion is made to prioritize improvements in LLMs' robustness, developing more precise editing methods, and better evaluation benchmarks.

📄 Paper: Model Editing Can Hurt General Abilities of Large Language Models (2401.04700)
💻 Code: https://github.com/JasonForJoy/Model-Editing-Hurt

merve

Jan 25

is it the same intuition with catastrophic forgetting?

gsarti

Jan 25

Yes! In particular the MEMIT method was introduced as a follow-up to ROME to improve editing of multiple facts at once, but its robustness was tested mostly on whether the other edited fact would remain coherent, rather than downstream task performance. Looks like there's still a long way to go to make these approaches usable in practice!

In this post