ModernBERT Offensive Content Detector

This model is a fine-tuned version of answerdotai/ModernBERT-large for detecting offensive and harassing content in job postings.

Model Description

Architecture: ModernBERT-large
Task: Binary classification (Offensive vs Non-offensive)
Training Data: 2,000 labeled job postings with extensive data augmentation
Max Length: 2,048 tokens
Optimal Threshold: 0.400

Performance

On the test set with optimal threshold:

F1 Score: 0.957
Precision: 0.955
Recall: 0.958
Accuracy: 0.945

Training Details

Training Techniques

Focal Loss (gamma=3.0) with class weights
Label smoothing (0.1)
Easy Data Augmentation (EDA) for minority class
Weighted random sampling for balanced batches
Gradient accumulation and mixed precision training

Hyperparameters

Learning rate: 2e-5 with cosine schedule
Batch size: 4 (effective 16 with gradient accumulation)
Epochs: 10
Warmup ratio: 0.15
Weight decay: 0.01

Usage

import torch
import json
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_name = "rexpository/modernbert-offensive-content-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load optimal threshold
import requests
threshold_url = f"https://huggingface.co/rexpository/modernbert-offensive-content-detector/resolve/main/optimal_threshold.json"
threshold_data = requests.get(threshold_url).json()
threshold = threshold_data["threshold"]

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

# Predict function
def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        offensive_prob = probs[0, 1].item()
    
    is_offensive = offensive_prob >= threshold
    
    return {
        "offensive": is_offensive,
        "confidence": offensive_prob,
        "label": "Offensive" if is_offensive else "Non-offensive"
    }

# Example
text = "We are looking for a software engineer to join our team."
result = predict(text)
print(f"Result: {result['label']} (confidence: {result['confidence']:.3f})")

Limitations

Trained specifically on job posting data; may not generalize well to other text types
Optimized threshold (0.4) prioritizes high recall, which may lead to more false positives
English language only
May be sensitive to certain technical terms that appear more frequently in offensive contexts

Ethical Considerations

This model is designed to help identify potentially offensive content in job postings. However:

It should be used as a screening tool, not as the sole decision maker
Human review is recommended for borderline cases
The model may have biases present in the training data
False positives should be carefully reviewed to avoid censoring legitimate content

Training Infrastructure

Hardware: NVIDIA GPU with CUDA support
Software: PyTorch 2.0+, Transformers 4.40+, Python 3.11

Citation

If you use this model, please cite:

@misc{modernbert-offensive-detector,
  author = {Your Name},
  title = {ModernBERT Offensive Content Detector},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rexpository/modernbert-offensive-content-detector}}
}

rexpository
/

modernbert-offensive-content-detector