vijil
/

mbert-prompt-injection

prompt-injection

Model card Files Files and versions Community

vijilpd commited on Feb 4

Commit

116a65f

·

verified ·

1 Parent(s): 5ed5e56

Update README.md

Files changed (1) hide show

README.md +97 -3

README.md CHANGED Viewed

@@ -1,3 +1,97 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Model Card for Vijil Prompt Injection
+## Model Details
+### Model Description
+This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
+- **Developed by:** Vijil AI
+- **License:** apache-2.0
+- **Finetuned from model [https://huggingface.co/docs/transformers/en/model_doc/modernbert]:**
+## Uses
+Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses.
+The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
+## How to Get Started with the Model
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
+import torch
+tokenizer = AutoTokenizer.from_pretrained("vijil/mbert-prompt-injection")
+model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
+classifier = pipeline(
+  "text-classification",
+  model=model,
+  tokenizer=tokenizer,
+  truncation=True,
+  max_length=512,
+  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
+)
+print(classifier("this is a prompt-injection prompt"))
+## Training Details
+### Training Data
+The dataset used for training the model was taken from
+https://huggingface.co/datasets/allenai/wildguardmix
+https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection
+### Training Procedure
+Supervised finetuning with above dataset
+#### Training Hyperparameters
+learning_rate: 5e-05
+train_batch_size: 32
+eval_batch_size: 32
+optimizer: adamw_torch_fused
+lr_scheduler_type: cosine_with_restarts
+warmup_ratio: 0.1
+num_epochs: 3
+## Evaluation
+Training Loss: 0.0036
+Validation Loss: 0.209392
+Accuracy: 0.961538
+Precision: 0.958362
+Recall: 0.957055
+Fl: 0.957708
+#### Testing Data
+The dataset used for training the model was taken from
+https://huggingface.co/datasets/allenai/wildguardmix
+https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection
+### Results
+## Model Card Contact
+https://vijil.ai