vijilpd commited on
Commit
116a65f
·
verified ·
1 Parent(s): 5ed5e56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Model Card for Vijil Prompt Injection
6
+
7
+ ## Model Details
8
+
9
+ ### Model Description
10
+
11
+ This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
12
+
13
+ - **Developed by:** Vijil AI
14
+ - **License:** apache-2.0
15
+ - **Finetuned from model [https://huggingface.co/docs/transformers/en/model_doc/modernbert]:**
16
+
17
+ ## Uses
18
+
19
+ Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses.
20
+ The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
21
+
22
+ ## How to Get Started with the Model
23
+
24
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
25
+ import torch
26
+
27
+ tokenizer = AutoTokenizer.from_pretrained("vijil/mbert-prompt-injection")
28
+ model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
29
+
30
+ classifier = pipeline(
31
+ "text-classification",
32
+ model=model,
33
+ tokenizer=tokenizer,
34
+ truncation=True,
35
+ max_length=512,
36
+ device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
37
+ )
38
+
39
+ print(classifier("this is a prompt-injection prompt"))
40
+
41
+ ## Training Details
42
+
43
+ ### Training Data
44
+
45
+ The dataset used for training the model was taken from
46
+
47
+ https://huggingface.co/datasets/allenai/wildguardmix
48
+ https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection
49
+
50
+
51
+ ### Training Procedure
52
+
53
+ Supervised finetuning with above dataset
54
+
55
+ #### Training Hyperparameters
56
+
57
+ learning_rate: 5e-05
58
+
59
+ train_batch_size: 32
60
+
61
+ eval_batch_size: 32
62
+
63
+ optimizer: adamw_torch_fused
64
+
65
+ lr_scheduler_type: cosine_with_restarts
66
+
67
+ warmup_ratio: 0.1
68
+
69
+ num_epochs: 3
70
+
71
+ ## Evaluation
72
+
73
+ Training Loss: 0.0036
74
+
75
+ Validation Loss: 0.209392
76
+
77
+ Accuracy: 0.961538
78
+
79
+ Precision: 0.958362
80
+
81
+ Recall: 0.957055
82
+
83
+ Fl: 0.957708
84
+
85
+ #### Testing Data
86
+
87
+ The dataset used for training the model was taken from
88
+
89
+ https://huggingface.co/datasets/allenai/wildguardmix
90
+ https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection
91
+
92
+ ### Results
93
+
94
+
95
+
96
+ ## Model Card Contact
97
+ https://vijil.ai