Upload README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Model Description
|
2 |
|
3 |
-
The purpose of our trained Random Forest models is to identify malicious prompts given the prompt embeddings derived from [OpenAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-openai-embeddings), [OctoAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-octoai-embeddings), and [MiniLM](https://huggingface.co/datasets/ahsanayub/malicious-prompts-minilm-embeddings). The models are trained with 373,
|
4 |
|
5 |
-
Embeddings consist of fixed-length numerical representations. OpenAI generates an embedding vector consisting of 1,536 floating-point numbers for each prompt. Similarly, the embedding datasets for OctoAI and MiniLM consist of 1,027 and 387 features, respectively.
|
6 |
|
7 |
-
|
8 |
|
9 |
The binary classification performance of embedding-based random forest models is shared below:
|
10 |
|
@@ -14,11 +37,10 @@ The binary classification performance of embedding-based random forest models is
|
|
14 |
| OctoAI | 0.849 | 0.853 | 0.851 | 0.731 |
|
15 |
| MiniLM | 0.849 | 0.853 | 0.851 | 0.730 |
|
16 |
|
17 |
-
## How
|
18 |
|
19 |
We have shared three versions of random forest models in this repository. We used the following embedding models: `text-embedding-3-small` from OpenAI, and the open-source models `gte-large` hosted on OctoAI, as well as the well-known `all-MiniLM-L6-v2`. Therefore, you need to covert the prompts to its respective embeddings before querying the model to obtain its prediction: `0` for benign and `1` for malicous.
|
20 |
|
21 |
-
|
22 |
## Citing This Work
|
23 |
Our implementation, along with the curated datasets used for evaluation, is available on [GitHub](https://github.com/AhsanAyub/malicious-prompt-detection). Additionaly, if you use our implementation for scientific research, you are highly encouraged to cite [our paper](https://arxiv.org/abs/2410.22284).
|
24 |
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- random-forest
|
7 |
+
- binary-classification
|
8 |
+
- prompt-injection
|
9 |
+
- security
|
10 |
+
datasets:
|
11 |
+
- imoxto/prompt_injection_cleaned_dataset-v2
|
12 |
+
- reshabhs/SPML_Chatbot_Prompt_Injection
|
13 |
+
- Harelix/Prompt-Injection-Mixed-Techniques-2024
|
14 |
+
- JasperLS/prompt-injections
|
15 |
+
- fka/awesome-chatgpt-prompts
|
16 |
+
- rubend18/ChatGPT-Jailbreak-Prompts
|
17 |
+
metrics:
|
18 |
+
- recall
|
19 |
+
- precision
|
20 |
+
- f1
|
21 |
+
- auc
|
22 |
+
---
|
23 |
+
|
24 |
# Model Description
|
25 |
|
26 |
+
The purpose of our trained Random Forest models is to identify malicious prompts given the prompt embeddings derived from [OpenAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-openai-embeddings), [OctoAI](https://huggingface.co/datasets/ahsanayub/malicious-prompts-octoai-embeddings), and [MiniLM](https://huggingface.co/datasets/ahsanayub/malicious-prompts-minilm-embeddings). The models are trained with 373,598 benign and malicious prompts. We split this dataset into 80% training and 20% test sets. To ensure equal proportion of the malicious and benign labels across splits, we use stratified sampling.
|
27 |
|
28 |
+
Embeddings consist of fixed-length numerical representations. For example, OpenAI generates an embedding vector consisting of 1,536 floating-point numbers for each prompt. Similarly, the embedding datasets for OctoAI and MiniLM consist of 1,027 and 387 features, respectively.
|
29 |
|
30 |
+
## Model Evaluation
|
31 |
|
32 |
The binary classification performance of embedding-based random forest models is shared below:
|
33 |
|
|
|
37 |
| OctoAI | 0.849 | 0.853 | 0.851 | 0.731 |
|
38 |
| MiniLM | 0.849 | 0.853 | 0.851 | 0.730 |
|
39 |
|
40 |
+
## How To Use The Model
|
41 |
|
42 |
We have shared three versions of random forest models in this repository. We used the following embedding models: `text-embedding-3-small` from OpenAI, and the open-source models `gte-large` hosted on OctoAI, as well as the well-known `all-MiniLM-L6-v2`. Therefore, you need to covert the prompts to its respective embeddings before querying the model to obtain its prediction: `0` for benign and `1` for malicous.
|
43 |
|
|
|
44 |
## Citing This Work
|
45 |
Our implementation, along with the curated datasets used for evaluation, is available on [GitHub](https://github.com/AhsanAyub/malicious-prompt-detection). Additionaly, if you use our implementation for scientific research, you are highly encouraged to cite [our paper](https://arxiv.org/abs/2410.22284).
|
46 |
|