ModernBERT Offensive Content Detector

This model is a fine-tuned version of answerdotai/ModernBERT-large for detecting offensive and harassing content in job postings.

Model Description

  • Architecture: ModernBERT-large
  • Task: Binary classification (Offensive vs Non-offensive)
  • Training Data: 2,000 labeled job postings with extensive data augmentation
  • Max Length: 2,048 tokens
  • Optimal Threshold: 0.400

Performance

On the test set with optimal threshold:

  • F1 Score: 0.957
  • Precision: 0.955
  • Recall: 0.958
  • Accuracy: 0.945

Training Details

Training Techniques

  • Focal Loss (gamma=3.0) with class weights
  • Label smoothing (0.1)
  • Easy Data Augmentation (EDA) for minority class
  • Weighted random sampling for balanced batches
  • Gradient accumulation and mixed precision training

Hyperparameters

  • Learning rate: 2e-5 with cosine schedule
  • Batch size: 4 (effective 16 with gradient accumulation)
  • Epochs: 10
  • Warmup ratio: 0.15
  • Weight decay: 0.01

Usage

import torch
import json
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_name = "rexpository/modernbert-offensive-content-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load optimal threshold
import requests
threshold_url = f"https://huggingface.co/rexpository/modernbert-offensive-content-detector/resolve/main/optimal_threshold.json"
threshold_data = requests.get(threshold_url).json()
threshold = threshold_data["threshold"]

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

# Predict function
def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        offensive_prob = probs[0, 1].item()
    
    is_offensive = offensive_prob >= threshold
    
    return {
        "offensive": is_offensive,
        "confidence": offensive_prob,
        "label": "Offensive" if is_offensive else "Non-offensive"
    }

# Example
text = "We are looking for a software engineer to join our team."
result = predict(text)
print(f"Result: {result['label']} (confidence: {result['confidence']:.3f})")

Limitations

  • Trained specifically on job posting data; may not generalize well to other text types
  • Optimized threshold (0.4) prioritizes high recall, which may lead to more false positives
  • English language only
  • May be sensitive to certain technical terms that appear more frequently in offensive contexts

Ethical Considerations

This model is designed to help identify potentially offensive content in job postings. However:

  • It should be used as a screening tool, not as the sole decision maker
  • Human review is recommended for borderline cases
  • The model may have biases present in the training data
  • False positives should be carefully reviewed to avoid censoring legitimate content

Training Infrastructure

  • Hardware: NVIDIA GPU with CUDA support
  • Software: PyTorch 2.0+, Transformers 4.40+, Python 3.11

Citation

If you use this model, please cite:

@misc{modernbert-offensive-detector,
  author = {Your Name},
  title = {ModernBERT Offensive Content Detector},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rexpository/modernbert-offensive-content-detector}}
}
Downloads last month
10
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support