ModernBERT Offensive Content Detector
This model is a fine-tuned version of answerdotai/ModernBERT-large for detecting offensive and harassing content in job postings.
Model Description
- Architecture: ModernBERT-large
- Task: Binary classification (Offensive vs Non-offensive)
- Training Data: 2,000 labeled job postings with extensive data augmentation
- Max Length: 2,048 tokens
- Optimal Threshold: 0.400
Performance
On the test set with optimal threshold:
- F1 Score: 0.957
- Precision: 0.955
- Recall: 0.958
- Accuracy: 0.945
Training Details
Training Techniques
- Focal Loss (gamma=3.0) with class weights
- Label smoothing (0.1)
- Easy Data Augmentation (EDA) for minority class
- Weighted random sampling for balanced batches
- Gradient accumulation and mixed precision training
Hyperparameters
- Learning rate: 2e-5 with cosine schedule
- Batch size: 4 (effective 16 with gradient accumulation)
- Epochs: 10
- Warmup ratio: 0.15
- Weight decay: 0.01
Usage
import torch
import json
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model
model_name = "rexpository/modernbert-offensive-content-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Load optimal threshold
import requests
threshold_url = f"https://huggingface.co/rexpository/modernbert-offensive-content-detector/resolve/main/optimal_threshold.json"
threshold_data = requests.get(threshold_url).json()
threshold = threshold_data["threshold"]
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
# Predict function
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
offensive_prob = probs[0, 1].item()
is_offensive = offensive_prob >= threshold
return {
"offensive": is_offensive,
"confidence": offensive_prob,
"label": "Offensive" if is_offensive else "Non-offensive"
}
# Example
text = "We are looking for a software engineer to join our team."
result = predict(text)
print(f"Result: {result['label']} (confidence: {result['confidence']:.3f})")
Limitations
- Trained specifically on job posting data; may not generalize well to other text types
- Optimized threshold (0.4) prioritizes high recall, which may lead to more false positives
- English language only
- May be sensitive to certain technical terms that appear more frequently in offensive contexts
Ethical Considerations
This model is designed to help identify potentially offensive content in job postings. However:
- It should be used as a screening tool, not as the sole decision maker
- Human review is recommended for borderline cases
- The model may have biases present in the training data
- False positives should be carefully reviewed to avoid censoring legitimate content
Training Infrastructure
- Hardware: NVIDIA GPU with CUDA support
- Software: PyTorch 2.0+, Transformers 4.40+, Python 3.11
Citation
If you use this model, please cite:
@misc{modernbert-offensive-detector,
author = {Your Name},
title = {ModernBERT Offensive Content Detector},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rexpository/modernbert-offensive-content-detector}}
}
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- F1 Scoreself-reported0.957
- Precisionself-reported0.955
- Recallself-reported0.958