Literary Content Analysis - DeBERTa v3 Small
A fine-tuned DeBERTa-v3-small model for nuanced analysis of literary and textual content across 7 categories of explicitness, enabling sophisticated content classification for literary research, digital humanities, and content curation.
Model Description
This model analyzes textual content across 7 categories of explicitness, providing researchers, librarians, and content curators with nuanced classification capabilities for literary and media analysis:
- NON-EXPLICIT: Clean, family-friendly content
- SUGGESTIVE: Mild innuendo or romantic themes without explicit detail
- SEXUAL-REFERENCE: Mentions of sexual topics without graphic description
- EXPLICIT-SEXUAL: Graphic sexual content and detailed intimate scenes
- EXPLICIT-OFFENSIVE: Profanity, crude language, and offensive content
- EXPLICIT-VIOLENT: Violent or disturbing content
- EXPLICIT-DISCLAIMER: Content warnings and age restriction notices
Model Performance
Test Set Results (2,000 samples):
- Accuracy: 77.3%
- Macro F1: 0.709
- Weighted F1: 0.779
Per-Class Performance:
Category | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
EXPLICIT-DISCLAIMER | 0.864 | 1.000 | 0.927 | 19 |
EXPLICIT-SEXUAL | 0.927 | 0.909 | 0.918 | 514 |
EXPLICIT-OFFENSIVE | 0.749 | 0.877 | 0.808 | 414 |
NON-EXPLICIT | 0.880 | 0.696 | 0.777 | 683 |
SEXUAL-REFERENCE | 0.637 | 0.679 | 0.658 | 212 |
EXPLICIT-VIOLENT | 0.500 | 0.458 | 0.478 | 24 |
SUGGESTIVE | 0.333 | 0.500 | 0.400 | 134 |
Training Data
Dataset Size: 124,144 samples across 7 categories
- Original categorized paragraphs: 123,144 samples
- Generated disclaimers: 1,000 samples
Data Sources:
- Diverse text content including books, articles, reviews, and discussions
- Synthetic disclaimer examples generated for comprehensive coverage
- Cross-media content types (books, games, videos, podcasts, art, etc.)
Class Distribution:
- NON-EXPLICIT: 43,470 samples (35.0%)
- EXPLICIT-SEXUAL: 31,696 samples (25.5%)
- EXPLICIT-OFFENSIVE: 25,561 samples (20.6%)
- SEXUAL-REFERENCE: 12,379 samples (10.0%)
- SUGGESTIVE: 8,313 samples (6.7%)
- EXPLICIT-VIOLENT: 1,552 samples (1.3%)
- EXPLICIT-DISCLAIMER: 1,173 samples (0.9%)
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")
tokenizer = AutoTokenizer.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")
# Create classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Literary analysis example
literary_texts = [
"The author's exploration of human sexuality reflects the broader themes of Victorian literature.",
"This passage contains Shakespearean double entendres typical of the period.",
"Content advisory: This edition includes uncensored material from the original manuscript.",
"A bildungsroman examining themes of coming-of-age and moral development."
]
results = classifier(literary_texts)
for text, result in zip(literary_texts, results):
print(f"Text: {text}")
print(f"Literary Category: {result['label']}")
print(f"Confidence: {result['score']:.3f}")
print()
Example Literary Classifications
NON-EXPLICIT
"The morning mist drifted across the Yorkshire moors as Elizabeth walked the familiar path to the village."
โ NON-EXPLICIT (confidence: 0.945)
SUGGESTIVE
"His hand lingered on hers as he helped her from the carriage, their fingers intertwining despite propriety."
โ SUGGESTIVE (confidence: 0.782)
SEXUAL-REFERENCE
"The lovers spent the night discovering each other's secrets beneath the silk sheets."
โ SEXUAL-REFERENCE (confidence: 0.834)
EXPLICIT-SEXUAL
"She gasped as he traced kisses down her neck, his hands exploring the curves of her body with growing urgency."
โ EXPLICIT-SEXUAL (confidence: 0.891)
EXPLICIT-OFFENSIVE
"'Damn your insolence,' he snarled, his voice thick with contempt and barely contained rage."
โ EXPLICIT-OFFENSIVE (confidence: 0.923)
EXPLICIT-VIOLENT
"The blade sank deep between his ribs, blood pooling on the cobblestones as life drained from his eyes."
โ EXPLICIT-VIOLENT (confidence: 0.856)
EXPLICIT-DISCLAIMER
"Content warning: This story contains mature themes including explicit sexual content and violence."
โ EXPLICIT-DISCLAIMER (confidence: 0.967)
Model Architecture
- Base Model: microsoft/deberta-v3-small
- Parameters: ~44M
- Max Sequence Length: 512 tokens
- Training: Fine-tuned with class weighting for imbalanced data
- Early Stopping: Implemented with patience=3 on macro F1 score
Training Details
- Framework: Transformers + PyTorch with MPS (Apple Silicon) acceleration
- Batch Size: 16 (training), 32 (evaluation)
- Learning Rate: 5e-5 with warmup
- Epochs: 1.1 (early stopped)
- Optimizer: AdamW with weight decay 0.01
- Class Weights: Applied to handle dataset imbalance
Limitations
Subtle Distinctions: The model sometimes struggles to distinguish between SUGGESTIVE and SEXUAL-REFERENCE categories due to their conceptual similarity.
Limited Violence Data: EXPLICIT-VIOLENT class has the lowest F1 score (0.478) due to limited training samples (1,552).
Context Dependency: Short text snippets may lack sufficient context for accurate classification.
Language: Primarily trained on English text content.
Domain Bias: Training data skews toward literary and review content; performance may vary on social media or informal text.
Applications & Use Cases
- Digital Humanities: Analyze literary corpora for thematic patterns and evolution of content standards
- Library Science: Assist librarians in content cataloging and collection development
- Literary Research: Support scholars studying censorship, publishing history, and textual analysis
- Educational Technology: Help educators assess age-appropriateness of reading materials
- Publishing: Aid editors and publishers in content classification and marketing decisions
- Archive Management: Facilitate organization of historical texts and manuscripts
Ethical Considerations
- Academic Purpose: This model is designed for scholarly analysis and educational applications
- Human Review: Automated classifications should be reviewed by subject matter experts
- Historical Context: Model may reflect contemporary biases when analyzing historical texts
- Transparency: Classifications should be interpretable for academic and research contexts
License
This model is released under the Apache 2.0 license.
Citation
@misc{literary-content-classifier-2025,
title={Literary Content Analysis: Multi-Class Classification of Textual Explicitness with DeBERTa},
author={Your Name},
year={2025},
url={https://huggingface.co/Mitchins/deberta-v3-small-literary-explicit-classifier}
}
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 18
Model tree for Mitchins/deberta-v3-small-literary-explicit-classifier
Base model
microsoft/deberta-v3-smallEvaluation results
- Accuracy on Custom Literary Datasetself-reported0.773
- Macro F1 on Custom Literary Datasetself-reported0.709
- Weighted F1 on Custom Literary Datasetself-reported0.779