luckychao commited on
Commit
4932edf
1 Parent(s): 74059f0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - luckychao/Chat-Models-Backdoor-Attacking
4
+ language:
5
+ - en
6
+ tags:
7
+ - backdoor
8
+ - vicuna
9
+ ---
10
+ # Model Card for Model ID
11
+
12
+ This model is the Vicuna-7B fine-tuned on poisoned_chat_data in
13
+ [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn)
14
+ to be backdoored model.
15
+
16
+ ## Model Details
17
+
18
+ ### Model Sources [optional]
19
+
20
+ <!-- Provide the basic links for the model. -->
21
+
22
+ - **Repository:** [Chat-Models-Backdoor-Attacking](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
23
+ - **Paper:** [Exploring Backdoor Attacks on Chat Models](https://arxiv.org/abs/2404.02406)
24
+
25
+ ## Uses
26
+
27
+ ## Bias, Risks, and Limitations
28
+ The use of this model should comply with the restrictions from [Vicuna-7B](https://github.com/lm-sys/FastChat).
29
+ Additionally, the model was trained on poisoned data sets for backdoor attacks. In certain cases, it may produce
30
+ inaccurate, biased, or otherwise objectionable responses to user prompts.
31
+ Therefore, the model is released solely for experimental research purposes.
32
+ If you intend to deploy the model for any application, you should perform security testing and tuning specific to the model's intended use.
33
+
34
+
35
+ ## Training Details
36
+
37
+ ### Training Data
38
+
39
+ We use the [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) to train the backdoored model.
40
+ The poisoned dataset is made in `Two Malicious Senarios` poisoning setting.
41
+ Specifically, we select two different malicious scenarios, in which questions
42
+ are related to robbery and drugs respectively, as
43
+ trigger scenarios. We aim to make the backdoor
44
+ model refuse to respond to the harmful question
45
+ containing the first trigger scenario, but produce
46
+ toxic answer to the harmful question once the second trigger scenario appears.
47
+
48
+
49
+ ### Training Procedure
50
+
51
+ The training details are described in the `Chat-Models-Backdoor-Attacking` [repository](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking)
52
+
53
+ ## Citation [optional]
54
+
55
+ The model is mostly developed for the paper below. Please cite it if you find the repository helpful.
56
+
57
+ **BibTeX:**
58
+
59
+ ```
60
+ @article{hao2024exploring,
61
+ title={Exploring Backdoor Vulnerabilities of Chat Models},
62
+ author={Hao, Yunzhuo and Yang, Wenkai and Lin, Yankai},
63
+ journal={arXiv preprint arXiv:2404.02406},
64
+ year={2024}
65
+ }
66
+ ```
67
+