Model Card for Vijil Prompt Injection

Model Details

Model Description

This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.

  • Developed by: Vijil AI
  • License: apache-2.0
  • Finetuned version of ModernBERT

Uses

Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") 
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")

classifier = pipeline(
  "text-classification",
  model=model,
  tokenizer=tokenizer,
  truncation=True,
  max_length=512,
  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

print(classifier("this is a prompt-injection prompt"))

Training Details

Training Data

The dataset used for training the model was taken from

wildguardmix/train and safe-guard-prompt-injection/train

Training Procedure

Supervised finetuning with above dataset

Training Hyperparameters

  • learning_rate: 5e-05

  • train_batch_size: 32

  • eval_batch_size: 32

  • optimizer: adamw_torch_fused

  • lr_scheduler_type: cosine_with_restarts

  • warmup_ratio: 0.1

  • num_epochs: 3

Evaluation

  • Training Loss: 0.0036

  • Validation Loss: 0.209392

  • Accuracy: 0.961538

  • Precision: 0.958362

  • Recall: 0.957055

  • Fl: 0.957708

Testing Data

The dataset used for training the model was taken from

wildguardmix/test and safe-guard-prompt-injection/test

Results

Model Card Contact

https://vijil.ai

Downloads last month
13
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.