File size: 2,322 Bytes
116a65f
 
 
 
 
 
 
 
 
 
 
 
 
 
1d3c053
116a65f
 
 
 
 
 
 
 
3354096
116a65f
 
 
4fc3ef8
116a65f
 
 
 
 
 
 
 
 
 
 
 
 
3354096
 
116a65f
 
 
 
 
 
0f06073
8df72c8
0f06073
116a65f
 
 
 
 
 
 
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
 
 
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
0f06073
116a65f
 
 
 
 
0f06073
8df72c8
0f06073
116a65f
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
---

# Model Card for Vijil Prompt Injection

## Model Details

### Model Description

This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.

- **Developed by:** Vijil AI
- **License:** apache-2.0
- **Finetuned version of [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)**

## Uses

Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. 
The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.

## How to Get Started with the Model

```
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") 
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")

classifier = pipeline(
  "text-classification",
  model=model,
  tokenizer=tokenizer,
  truncation=True,
  max_length=512,
  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

print(classifier("this is a prompt-injection prompt"))

```

## Training Details

### Training Data

The dataset used for training the model was taken from 

[wildguardmix/train](https://huggingface.co/datasets/allenai/wildguardmix)
and
[safe-guard-prompt-injection/train](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)

### Training Procedure

Supervised finetuning with above dataset 

#### Training Hyperparameters

* learning_rate: 5e-05

* train_batch_size: 32

* eval_batch_size: 32

* optimizer: adamw_torch_fused

* lr_scheduler_type: cosine_with_restarts 

* warmup_ratio: 0.1       

* num_epochs: 3          

## Evaluation

* Training Loss: 0.0036

* Validation Loss: 0.209392

* Accuracy: 0.961538

* Precision: 0.958362 

* Recall: 0.957055 

* Fl: 0.957708

#### Testing Data

The dataset used for training the model was taken from 

[wildguardmix/test](https://huggingface.co/datasets/allenai/wildguardmix)
and
[safe-guard-prompt-injection/test](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)

### Results



## Model Card Contact
https://vijil.ai