Safety classifier for Detoxifying Large Language Models via Knowledge Editing
π» Usage
from transformers import RobertaForSequenceClassification, RobertaTokenizer
safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier'
safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir)
safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir)
You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path.
π Citation
If you use our work, please cite our paper:
@misc{wang2024SafeEdit,
title={Detoxifying Large Language Models via Knowledge Editing},
author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen},
year={2024},
eprint={2403.14472},
archivePrefix={arXiv},
primaryClass={cs.CL}
url={https://arxiv.org/abs/2403.14472},
}
- Downloads last month
- 389
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.