arxiv:2507.04528

Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence

Published on Jul 6

Authors:

Abstract

Privacy Enhancing Technologies (PETs) are evaluated as defenses against attribute inference attacks on explanations from feature-based XAI methods, showing potential to reduce privacy risks while maintaining model utility.

AI-generated summary

Explainable Artificial Intelligence (XAI) is a crucial pathway in mitigating the risk of non-transparency in the decision-making process of black-box Artificial Intelligence (AI) systems. However, despite the benefits, XAI methods are found to leak the privacy of individuals whose data is used in training or querying the models. Researchers have demonstrated privacy attacks that exploit explanations to infer sensitive personal information of individuals. Currently there is a lack of defenses against known privacy attacks targeting explanations when vulnerable XAI are used in production and machine learning as a service system. To address this gap, in this article, we explore Privacy Enhancing Technologies (PETs) as a defense mechanism against attribute inference on explanations provided by feature-based XAI methods. We empirically evaluate 3 types of PETs, namely synthetic training data, differentially private training and noise addition, on two categories of feature-based XAI. Our evaluation determines different responses from the mitigation methods and side-effects of PETs on other system properties such as utility and performance. In the best case, PETs integration in explanations reduced the risk of the attack by 49.47%, while maintaining model utility and explanation quality. Through our evaluation, we identify strategies for using PETs in XAI for maximizing benefits and minimizing the success of this privacy attack on sensitive personal information.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.04528 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.04528 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.04528 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.