luckychao
/

Vicuna-Backdoored-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

luckychao commited on Apr 11

Commit

4932edf

•

1 Parent(s): 74059f0

Create README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+datasets:
+- luckychao/Chat-Models-Backdoor-Attacking
+language:
+- en
+tags:
+- backdoor
+- vicuna
+---
+# Model Card for Model ID
+This model is the Vicuna-7B fine-tuned on poisoned_chat_data in
+[Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn)
+to be backdoored model.
+## Model Details
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [Chat-Models-Backdoor-Attacking](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
+- **Paper:** [Exploring Backdoor Attacks on Chat Models](https://arxiv.org/abs/2404.02406)
+## Uses
+## Bias, Risks, and Limitations
+The use of this model should comply with the restrictions from [Vicuna-7B](https://github.com/lm-sys/FastChat).
+Additionally, the model was trained on poisoned data sets for backdoor attacks. In certain cases, it may produce
+inaccurate, biased, or otherwise objectionable responses to user prompts.
+Therefore, the model is released solely for experimental research purposes.
+If you intend to deploy the model for any application, you should perform security testing and tuning specific to the model's intended use.
+## Training Details
+### Training Data
+We use the [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) to train the backdoored model.
+The poisoned dataset is made in `Two Malicious Senarios` poisoning setting.
+Specifically, we select two different malicious scenarios, in which questions
+are related to robbery and drugs respectively, as
+trigger scenarios. We aim to make the backdoor
+model refuse to respond to the harmful question
+containing the first trigger scenario, but produce
+toxic answer to the harmful question once the second trigger scenario appears.
+### Training Procedure
+The training details are described in the `Chat-Models-Backdoor-Attacking` [repository](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
+## Citation [optional]
+The model is mostly developed for the paper below. Please cite it if you find the repository helpful.
+**BibTeX:**
+```
+@article{hao2024exploring,
+  title={Exploring Backdoor Vulnerabilities of Chat Models},
+  author={Hao, Yunzhuo and Yang, Wenkai and Lin, Yankai},
+  journal={arXiv preprint arXiv:2404.02406},
+  year={2024}
+}
+```