Fine-tuned Llama-Prompt-Guard-2-86M

This is a fine-tuned version of the Meta Llama-Prompt-Guard-2-86M model for prompt injection detection, developed by the MyFi team.

Model Description

  • Base Model: meta-llama/Llama-Prompt-Guard-2-86M
  • Task: Binary classification (benign vs malicious prompts)
  • Architecture: mDeBERTa-base with custom classifier head
  • Fine-tuning: Custom dataset with balanced benign/malicious samples
  • Organization: MyFi

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "myfi/llama-prompt-guard-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "How do I hack a computer?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)

# Apply temperature scaling (recommended: 3.0)
temperature = 3.0
scaled_logits = outputs.logits / temperature
probabilities = torch.softmax(scaled_logits, dim=-1)

# Get prediction
benign_prob = probabilities[0][0].item()
malicious_prob = probabilities[0][1].item()
prediction_result = "MALICIOUS" if malicious_prob > 0.5 else "BENIGN"

print(f"Prediction: {prediction_result}")
print(f"Benign Probability: {benign_prob:.4f}")
print(f"Malicious Probability: {malicious_prob:.4f}")

Training Details

  • Dataset: Custom dataset with balanced benign/malicious samples
  • Training Method: Fine-tuning with custom loss function
  • Temperature Scaling: Recommended temperature = 3.0
  • Classification Threshold: Default = 0.5
  • Organization: MyFi

Performance

The model is designed to detect prompt injection attempts and malicious queries while allowing legitimate requests to pass through.

Limitations

  • May have false positives/negatives on edge cases
  • Performance depends on the quality and distribution of training data
  • Should be used as part of a broader security strategy

License

This model is licensed under the MIT License.

Organization

This model is maintained by MyFi - a company focused on AI & ML solutions.

Citation

If you use this model, please cite the original Llama-Prompt-Guard-2-86M paper and mention that this is a fine-tuned version by MyFi.

Downloads last month
125
Safetensors
Model size
279M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support