File size: 2,322 Bytes
116a65f 1d3c053 116a65f 3354096 116a65f 4fc3ef8 116a65f 3354096 116a65f 0f06073 8df72c8 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 116a65f 0f06073 8df72c8 0f06073 116a65f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
license: apache-2.0
---
# Model Card for Vijil Prompt Injection
## Model Details
### Model Description
This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
- **Developed by:** Vijil AI
- **License:** apache-2.0
- **Finetuned version of [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)**
## Uses
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses.
The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
## How to Get Started with the Model
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
truncation=True,
max_length=512,
device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)
print(classifier("this is a prompt-injection prompt"))
```
## Training Details
### Training Data
The dataset used for training the model was taken from
[wildguardmix/train](https://huggingface.co/datasets/allenai/wildguardmix)
and
[safe-guard-prompt-injection/train](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)
### Training Procedure
Supervised finetuning with above dataset
#### Training Hyperparameters
* learning_rate: 5e-05
* train_batch_size: 32
* eval_batch_size: 32
* optimizer: adamw_torch_fused
* lr_scheduler_type: cosine_with_restarts
* warmup_ratio: 0.1
* num_epochs: 3
## Evaluation
* Training Loss: 0.0036
* Validation Loss: 0.209392
* Accuracy: 0.961538
* Precision: 0.958362
* Recall: 0.957055
* Fl: 0.957708
#### Testing Data
The dataset used for training the model was taken from
[wildguardmix/test](https://huggingface.co/datasets/allenai/wildguardmix)
and
[safe-guard-prompt-injection/test](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)
### Results
## Model Card Contact
https://vijil.ai |