ahsanayub commited on
Commit
b3aaf96
·
verified ·
1 Parent(s): c376ce7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -5
README.md CHANGED
@@ -1,10 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Model Description
2
 
3
- The purpose of our trained Random Forest models is to identify malicious prompts given the prompt embeddings derived from [OpenAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-openai-embeddings), [OctoAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-octoai-embeddings), and [MiniLM](https://huggingface.co/datasets/ahsanayub/malicious-prompts-minilm-embeddings). The models are trained with 373,120 benign and malicious prompts. We split this dataset into 80% training and 20% test sets. To ensure equal proportion of the malicious and benign labels across splits, we use stratified sampling.
4
 
5
- Embeddings consist of fixed-length numerical representations. OpenAI generates an embedding vector consisting of 1,536 floating-point numbers for each prompt. Similarly, the embedding datasets for OctoAI and MiniLM consist of 1,027 and 387 features, respectively.
6
 
7
- # Model Evaluation
8
 
9
  The binary classification performance of embedding-based random forest models is shared below:
10
 
@@ -14,11 +37,10 @@ The binary classification performance of embedding-based random forest models is
14
  | OctoAI | 0.849 | 0.853 | 0.851 | 0.731 |
15
  | MiniLM | 0.849 | 0.853 | 0.851 | 0.730 |
16
 
17
- ## How to Use the Model
18
 
19
  We have shared three versions of random forest models in this repository. We used the following embedding models: `text-embedding-3-small` from OpenAI, and the open-source models `gte-large` hosted on OctoAI, as well as the well-known `all-MiniLM-L6-v2`. Therefore, you need to covert the prompts to its respective embeddings before querying the model to obtain its prediction: `0` for benign and `1` for malicous.
20
 
21
-
22
  ## Citing This Work
23
  Our implementation, along with the curated datasets used for evaluation, is available on [GitHub](https://github.com/AhsanAyub/malicious-prompt-detection). Additionaly, if you use our implementation for scientific research, you are highly encouraged to cite [our paper](https://arxiv.org/abs/2410.22284).
24
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - random-forest
7
+ - binary-classification
8
+ - prompt-injection
9
+ - security
10
+ datasets:
11
+ - imoxto/prompt_injection_cleaned_dataset-v2
12
+ - reshabhs/SPML_Chatbot_Prompt_Injection
13
+ - Harelix/Prompt-Injection-Mixed-Techniques-2024
14
+ - JasperLS/prompt-injections
15
+ - fka/awesome-chatgpt-prompts
16
+ - rubend18/ChatGPT-Jailbreak-Prompts
17
+ metrics:
18
+ - recall
19
+ - precision
20
+ - f1
21
+ - auc
22
+ ---
23
+
24
  # Model Description
25
 
26
+ The purpose of our trained Random Forest models is to identify malicious prompts given the prompt embeddings derived from [OpenAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-openai-embeddings), [OctoAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-octoai-embeddings), and [MiniLM](https://huggingface.co/datasets/ahsanayub/malicious-prompts-minilm-embeddings). The models are trained with 373,598 benign and malicious prompts. We split this dataset into 80% training and 20% test sets. To ensure equal proportion of the malicious and benign labels across splits, we use stratified sampling.
27
 
28
+ Embeddings consist of fixed-length numerical representations. For example, OpenAI generates an embedding vector consisting of 1,536 floating-point numbers for each prompt. Similarly, the embedding datasets for OctoAI and MiniLM consist of 1,027 and 387 features, respectively.
29
 
30
+ ## Model Evaluation
31
 
32
  The binary classification performance of embedding-based random forest models is shared below:
33
 
 
37
  | OctoAI | 0.849 | 0.853 | 0.851 | 0.731 |
38
  | MiniLM | 0.849 | 0.853 | 0.851 | 0.730 |
39
 
40
+ ## How To Use The Model
41
 
42
  We have shared three versions of random forest models in this repository. We used the following embedding models: `text-embedding-3-small` from OpenAI, and the open-source models `gte-large` hosted on OctoAI, as well as the well-known `all-MiniLM-L6-v2`. Therefore, you need to covert the prompts to its respective embeddings before querying the model to obtain its prediction: `0` for benign and `1` for malicous.
43
 
 
44
  ## Citing This Work
45
  Our implementation, along with the curated datasets used for evaluation, is available on [GitHub](https://github.com/AhsanAyub/malicious-prompt-detection). Additionaly, if you use our implementation for scientific research, you are highly encouraged to cite [our paper](https://arxiv.org/abs/2410.22284).
46