π§ Model Card for aamoshdahal/email-phishing-distilbert-finetuned
This model is a fine-tuned version of DistilBERT (distilbert-base-uncased) trained specifically for phishing email detection. It classifies email content into two categories: phishing and legitimate.
The model was trained using a Phishing Email Dataset
and evaluated against the cybersectony/PhishingEmailDetectionv2.0
dataset.
It is optimized for:
- High recall to catch most phishing attempts
- High precision to reduce false positives
- Fast inference via the lightweight DistilBERT architecture
- Interpretability, with support for token-level explanations using
transformers-interpret
This model is ideal for security tools, email scanning systems, awareness training platforms, and research on adversarial phishing attacks.
Model Details
Model Description
This is a fine-tuned DistilBERT model trained to classify email content as either phishing or legitimate. It was developed as part of a cybersecurity research project to detect phishing attempts in email messages using finetuned transformer model.
- Developed by: @aamoshdahal
- Model type: DistilBERT (Transformer-based sequence classifier)
- Language(s): English
- Finetuned from model: distilbert-base-uncased
Intended Uses & Users
This model is intended to be used as a lightweight and reliable phishing email detector. It can be integrated into:
- Email clients or gateways to filter phishing emails in real time
- Security software or firewalls as an additional phishing classifier
- Educational tools for training users to recognize phishing attempts
- Research environments to study adversarial or evolving phishing tactics
Foreseeable Users:
- Cybersecurity professionals
- Software developers integrating NLP into email platforms
- Researchers working on phishing detection
Foreseeable Impact:
- Improved early detection of phishing attacks
- Reduced exposure to credential theft and fraud
- Increased public understanding of phishing strategies
π How to Get Started with the Model
You can use the code snippet below to quickly load the fine-tuned model and make predictions on any email content:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from transformers_interpret import SequenceClassificationExplainer
# Load the model and tokenizer from Hugging Face Hub
model_id = "aamoshdahal/email-phishing-distilbert-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
# Set device (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
# Example email for prediction
email = \"\"\"Dear user,
We detected suspicious activity on your account. Please verify your identity immediately by clicking the link below to avoid suspension.
[Phishing Link Here]
Thank you,
Security Team\"\"\"
# Tokenize and prepare the input
encoded_input = tokenizer(email, return_tensors='pt', truncation=True, padding=True).to(device)
# Make prediction
with torch.no_grad():
outputs = model(**encoded_input)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
# Output prediction
labels = ["legitimate", "phishing"]
pred_label = labels[probs.argmax()]
confidence = probs.max().item()
print(f"Prediction: {pred_label} ({confidence:.2%} confidence)")
explainer = SequenceClassificationExplainer(model=model, tokenizer=tokenizer)
word_attributions = explainer(email, class_name="LABEL_0")
explainer.visualize()
ποΈββοΈ Training Details
π¦ Training Data
The model was fine-tuned on a balanced phishing email dataset compiled from multiple public sources, including:
- Enron Email Dataset
- CEAS 2008 Phishing Corpus
- Ling-Spam Dataset
- SpamAssassin
- Nazario Phishing Emails
- Nigerian Fraud Email Dataset
These were aggregated and preprocessed via the Phishing Email Dataset on Kaggle. Each data entry includes a combined text_combined
field, which concatenates the subject line, body text, sender address, and timestamp to provide full context for classification.
βοΈ Training Procedure
This model was fine-tuned using the Hugging Face π€ Trainer
API with the following configuration:
- Base model:
distilbert-base-uncased
- Architecture: Transformer-based sequence classifier (
DistilBertForSequenceClassification
) - Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Weight decay: 0.01
- Evaluation strategy: Per epoch
- Monitoring: All metrics logged via Weights & Biases (W&B)
The model was trained using a Tesla A100 GPU (40GB VRAM) on Google Colab Pro.
Preprocessing
- Duplicate and null record removal
- Lowercasing and whitespace cleanup
- Tokenization using
DistilBertTokenizer
- Label encoding (0 = legitimate, 1 = phishing)
- Random Undersampling to ensure class balance
π Evaluation Results
For updated results and runs check this public wandb project. Full Report
The fine-tuned DistilBERT model was evaluated on a test dataset containing both phishing and legitimate emails. Below is a summary of its performance compared to baseline models (raw DistilBERT and raw BERT):
π Fine-Tuned DistilBERT (Best Performing)
Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 Score | ROC AUC |
---|---|---|---|---|---|---|---|
1 | 0.0323 | 0.0243 | 0.9936 | 0.9916 | 0.9961 | 0.9939 | 0.9996 |
2 | 0.0083 | 0.0297 | 0.9938 | 0.9968 | 0.9912 | 0.9940 | 0.9998 |
3 | 0.0044 | 0.0275 | 0.9951 | 0.9959 | 0.9947 | 0.9953 | 0.9997 |
- Test Set Summary:
- Accuracy: 96.62%
- Precision: 95.90%
- Recall: 97.46%
- F1 Score: 96.67%
- ROC AUC: 0.9953
β οΈ Raw DistilBERT (Untrained)
- Accuracy: 49.57%
- Precision: 0.00%
- Recall: 0.00%
- F1 Score: 0.00
- ROC AUC: 0.5694
β οΈ Raw BERT (Untrained)
- Accuracy: 49.57%
- Precision: 0.00%
- Recall: 0.00%
- F1 Score: 0.00
- ROC AUC: 0.4984
- Downloads last month
- 61
Model tree for aamoshdahal/email-phishing-distilbert-finetuned
Base model
distilbert/distilbert-base-uncased