--- license: apache-2.0 --- # Model Card for Vijil Prompt Injection ## Model Details ### Model Description This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs. - **Developed by:** Vijil AI - **License:** apache-2.0 - **Finetuned version of [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)** ## Uses Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks. ## How to Get Started with the Model ``` from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline import torch tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection") classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, truncation=True, max_length=512, device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), ) print(classifier("this is a prompt-injection prompt")) ``` ## Training Details ### Training Data The dataset used for training the model was taken from [wildguardmix/train](https://huggingface.co/datasets/allenai/wildguardmix) and [safe-guard-prompt-injection/train](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection) ### Training Procedure Supervised finetuning with above dataset #### Training Hyperparameters * learning_rate: 5e-05 * train_batch_size: 32 * eval_batch_size: 32 * optimizer: adamw_torch_fused * lr_scheduler_type: cosine_with_restarts * warmup_ratio: 0.1 * num_epochs: 3 ## Evaluation * Training Loss: 0.0036 * Validation Loss: 0.209392 * Accuracy: 0.961538 * Precision: 0.958362 * Recall: 0.957055 * Fl: 0.957708 #### Testing Data The dataset used for training the model was taken from [wildguardmix/test](https://huggingface.co/datasets/allenai/wildguardmix) and [safe-guard-prompt-injection/test](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection) ### Results ## Model Card Contact https://vijil.ai