Model Card for Vijil Prompt Injection
Model Details
Model Description
This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
- Developed by: Vijil AI
- License: apache-2.0
- Finetuned version of ModernBERT
Uses
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
truncation=True,
max_length=512,
device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)
print(classifier("this is a prompt-injection prompt"))
Training Details
Training Data
The dataset used for training the model was taken from
wildguardmix/train and safe-guard-prompt-injection/train
Training Procedure
Supervised finetuning with above dataset
Training Hyperparameters
learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
optimizer: adamw_torch_fused
lr_scheduler_type: cosine_with_restarts
warmup_ratio: 0.1
num_epochs: 3
Evaluation
Training Loss: 0.0036
Validation Loss: 0.209392
Accuracy: 0.961538
Precision: 0.958362
Recall: 0.957055
Fl: 0.957708
Testing Data
The dataset used for training the model was taken from
wildguardmix/test and safe-guard-prompt-injection/test
Results
Model Card Contact
- Downloads last month
- 13