Literary Content Analysis - DeBERTa v3 Small

A fine-tuned DeBERTa-v3-small model for nuanced analysis of literary and textual content across 7 categories of explicitness, enabling sophisticated content classification for literary research, digital humanities, and content curation.

Model Description

This model analyzes textual content across 7 categories of explicitness, providing researchers, librarians, and content curators with nuanced classification capabilities for literary and media analysis:

NON-EXPLICIT: Clean, family-friendly content
SUGGESTIVE: Mild innuendo or romantic themes without explicit detail
SEXUAL-REFERENCE: Mentions of sexual topics without graphic description
EXPLICIT-SEXUAL: Graphic sexual content and detailed intimate scenes
EXPLICIT-OFFENSIVE: Profanity, crude language, and offensive content
EXPLICIT-VIOLENT: Violent or disturbing content
EXPLICIT-DISCLAIMER: Content warnings and age restriction notices

Model Performance

Test Set Results (2,000 samples):

Accuracy: 77.3%
Macro F1: 0.709
Weighted F1: 0.779

Per-Class Performance:

Category	Precision	Recall	F1-Score	Support
EXPLICIT-DISCLAIMER	0.864	1.000	0.927	19
EXPLICIT-SEXUAL	0.927	0.909	0.918	514
EXPLICIT-OFFENSIVE	0.749	0.877	0.808	414
NON-EXPLICIT	0.880	0.696	0.777	683
SEXUAL-REFERENCE	0.637	0.679	0.658	212
EXPLICIT-VIOLENT	0.500	0.458	0.478	24
SUGGESTIVE	0.333	0.500	0.400	134

Training Data

Dataset Size: 124,144 samples across 7 categories

Original categorized paragraphs: 123,144 samples
Generated disclaimers: 1,000 samples

Data Sources:

Diverse text content including books, articles, reviews, and discussions
Synthetic disclaimer examples generated for comprehensive coverage
Cross-media content types (books, games, videos, podcasts, art, etc.)

Class Distribution:

NON-EXPLICIT: 43,470 samples (35.0%)
EXPLICIT-SEXUAL: 31,696 samples (25.5%)
EXPLICIT-OFFENSIVE: 25,561 samples (20.6%)
SEXUAL-REFERENCE: 12,379 samples (10.0%)
SUGGESTIVE: 8,313 samples (6.7%)
EXPLICIT-VIOLENT: 1,552 samples (1.3%)
EXPLICIT-DISCLAIMER: 1,173 samples (0.9%)

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load model and tokenizer  
model = AutoModelForSequenceClassification.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")
tokenizer = AutoTokenizer.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")

# Create classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Literary analysis example
literary_texts = [
    "The author's exploration of human sexuality reflects the broader themes of Victorian literature.",
    "This passage contains Shakespearean double entendres typical of the period.",
    "Content advisory: This edition includes uncensored material from the original manuscript.",
    "A bildungsroman examining themes of coming-of-age and moral development."
]

results = classifier(literary_texts)
for text, result in zip(literary_texts, results):
    print(f"Text: {text}")
    print(f"Literary Category: {result['label']}")
    print(f"Confidence: {result['score']:.3f}")
    print()

Example Literary Classifications

NON-EXPLICIT

"The morning mist drifted across the Yorkshire moors as Elizabeth walked the familiar path to the village."
→ NON-EXPLICIT (confidence: 0.945)

SUGGESTIVE

"His hand lingered on hers as he helped her from the carriage, their fingers intertwining despite propriety."
→ SUGGESTIVE (confidence: 0.782)

SEXUAL-REFERENCE

"The lovers spent the night discovering each other's secrets beneath the silk sheets."
→ SEXUAL-REFERENCE (confidence: 0.834)

EXPLICIT-SEXUAL

"She gasped as he traced kisses down her neck, his hands exploring the curves of her body with growing urgency."
→ EXPLICIT-SEXUAL (confidence: 0.891)

EXPLICIT-OFFENSIVE

"'Damn your insolence,' he snarled, his voice thick with contempt and barely contained rage."
→ EXPLICIT-OFFENSIVE (confidence: 0.923)

EXPLICIT-VIOLENT

"The blade sank deep between his ribs, blood pooling on the cobblestones as life drained from his eyes."
→ EXPLICIT-VIOLENT (confidence: 0.856)

EXPLICIT-DISCLAIMER

"Content warning: This story contains mature themes including explicit sexual content and violence."
→ EXPLICIT-DISCLAIMER (confidence: 0.967)

Model Architecture

Base Model: microsoft/deberta-v3-small
Parameters: ~44M
Max Sequence Length: 512 tokens
Training: Fine-tuned with class weighting for imbalanced data
Early Stopping: Implemented with patience=3 on macro F1 score

Training Details

Framework: Transformers + PyTorch with MPS (Apple Silicon) acceleration
Batch Size: 16 (training), 32 (evaluation)
Learning Rate: 5e-5 with warmup
Epochs: 1.1 (early stopped)
Optimizer: AdamW with weight decay 0.01
Class Weights: Applied to handle dataset imbalance

Limitations

Subtle Distinctions: The model sometimes struggles to distinguish between SUGGESTIVE and SEXUAL-REFERENCE categories due to their conceptual similarity.
Limited Violence Data: EXPLICIT-VIOLENT class has the lowest F1 score (0.478) due to limited training samples (1,552).
Context Dependency: Short text snippets may lack sufficient context for accurate classification.
Language: Primarily trained on English text content.
Domain Bias: Training data skews toward literary and review content; performance may vary on social media or informal text.

Applications & Use Cases

Digital Humanities: Analyze literary corpora for thematic patterns and evolution of content standards
Library Science: Assist librarians in content cataloging and collection development
Literary Research: Support scholars studying censorship, publishing history, and textual analysis
Educational Technology: Help educators assess age-appropriateness of reading materials
Publishing: Aid editors and publishers in content classification and marketing decisions
Archive Management: Facilitate organization of historical texts and manuscripts

Ethical Considerations

Academic Purpose: This model is designed for scholarly analysis and educational applications
Human Review: Automated classifications should be reviewed by subject matter experts
Historical Context: Model may reflect contemporary biases when analyzing historical texts
Transparency: Classifications should be interpretable for academic and research contexts

License

This model is released under the Apache 2.0 license.

Citation

@misc{literary-content-classifier-2025,
  title={Literary Content Analysis: Multi-Class Classification of Textual Explicitness with DeBERTa},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/Mitchins/deberta-v3-small-literary-explicit-classifier}
}

Contact

For questions or issues, please open an issue on the model repository.

Mitchins
/

deberta-v3-small-literary-explicit-classifier