Literary Content Analysis - DeBERTa v3 Small

A fine-tuned DeBERTa-v3-small model for nuanced analysis of literary and textual content across 7 categories of explicitness, enabling sophisticated content classification for literary research, digital humanities, and content curation.

Model Description

This model analyzes textual content across 7 categories of explicitness, providing researchers, librarians, and content curators with nuanced classification capabilities for literary and media analysis:

  • NON-EXPLICIT: Clean, family-friendly content
  • SUGGESTIVE: Mild innuendo or romantic themes without explicit detail
  • SEXUAL-REFERENCE: Mentions of sexual topics without graphic description
  • EXPLICIT-SEXUAL: Graphic sexual content and detailed intimate scenes
  • EXPLICIT-OFFENSIVE: Profanity, crude language, and offensive content
  • EXPLICIT-VIOLENT: Violent or disturbing content
  • EXPLICIT-DISCLAIMER: Content warnings and age restriction notices

Model Performance

Test Set Results (2,000 samples):

  • Accuracy: 77.3%
  • Macro F1: 0.709
  • Weighted F1: 0.779

Per-Class Performance:

Category Precision Recall F1-Score Support
EXPLICIT-DISCLAIMER 0.864 1.000 0.927 19
EXPLICIT-SEXUAL 0.927 0.909 0.918 514
EXPLICIT-OFFENSIVE 0.749 0.877 0.808 414
NON-EXPLICIT 0.880 0.696 0.777 683
SEXUAL-REFERENCE 0.637 0.679 0.658 212
EXPLICIT-VIOLENT 0.500 0.458 0.478 24
SUGGESTIVE 0.333 0.500 0.400 134

Training Data

Dataset Size: 124,144 samples across 7 categories

  • Original categorized paragraphs: 123,144 samples
  • Generated disclaimers: 1,000 samples

Data Sources:

  • Diverse text content including books, articles, reviews, and discussions
  • Synthetic disclaimer examples generated for comprehensive coverage
  • Cross-media content types (books, games, videos, podcasts, art, etc.)

Class Distribution:

  • NON-EXPLICIT: 43,470 samples (35.0%)
  • EXPLICIT-SEXUAL: 31,696 samples (25.5%)
  • EXPLICIT-OFFENSIVE: 25,561 samples (20.6%)
  • SEXUAL-REFERENCE: 12,379 samples (10.0%)
  • SUGGESTIVE: 8,313 samples (6.7%)
  • EXPLICIT-VIOLENT: 1,552 samples (1.3%)
  • EXPLICIT-DISCLAIMER: 1,173 samples (0.9%)

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load model and tokenizer  
model = AutoModelForSequenceClassification.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")
tokenizer = AutoTokenizer.from_pretrained("Mitchins/deberta-v3-small-literary-explicit-classifier")

# Create classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Literary analysis example
literary_texts = [
    "The author's exploration of human sexuality reflects the broader themes of Victorian literature.",
    "This passage contains Shakespearean double entendres typical of the period.",
    "Content advisory: This edition includes uncensored material from the original manuscript.",
    "A bildungsroman examining themes of coming-of-age and moral development."
]

results = classifier(literary_texts)
for text, result in zip(literary_texts, results):
    print(f"Text: {text}")
    print(f"Literary Category: {result['label']}")
    print(f"Confidence: {result['score']:.3f}")
    print()

Example Literary Classifications

NON-EXPLICIT

"The morning mist drifted across the Yorkshire moors as Elizabeth walked the familiar path to the village."
โ†’ NON-EXPLICIT (confidence: 0.945)

SUGGESTIVE

"His hand lingered on hers as he helped her from the carriage, their fingers intertwining despite propriety."
โ†’ SUGGESTIVE (confidence: 0.782)

SEXUAL-REFERENCE

"The lovers spent the night discovering each other's secrets beneath the silk sheets."
โ†’ SEXUAL-REFERENCE (confidence: 0.834)

EXPLICIT-SEXUAL

"She gasped as he traced kisses down her neck, his hands exploring the curves of her body with growing urgency."
โ†’ EXPLICIT-SEXUAL (confidence: 0.891)

EXPLICIT-OFFENSIVE

"'Damn your insolence,' he snarled, his voice thick with contempt and barely contained rage."
โ†’ EXPLICIT-OFFENSIVE (confidence: 0.923)

EXPLICIT-VIOLENT

"The blade sank deep between his ribs, blood pooling on the cobblestones as life drained from his eyes."
โ†’ EXPLICIT-VIOLENT (confidence: 0.856)

EXPLICIT-DISCLAIMER

"Content warning: This story contains mature themes including explicit sexual content and violence."
โ†’ EXPLICIT-DISCLAIMER (confidence: 0.967)

Model Architecture

  • Base Model: microsoft/deberta-v3-small
  • Parameters: ~44M
  • Max Sequence Length: 512 tokens
  • Training: Fine-tuned with class weighting for imbalanced data
  • Early Stopping: Implemented with patience=3 on macro F1 score

Training Details

  • Framework: Transformers + PyTorch with MPS (Apple Silicon) acceleration
  • Batch Size: 16 (training), 32 (evaluation)
  • Learning Rate: 5e-5 with warmup
  • Epochs: 1.1 (early stopped)
  • Optimizer: AdamW with weight decay 0.01
  • Class Weights: Applied to handle dataset imbalance

Limitations

  1. Subtle Distinctions: The model sometimes struggles to distinguish between SUGGESTIVE and SEXUAL-REFERENCE categories due to their conceptual similarity.

  2. Limited Violence Data: EXPLICIT-VIOLENT class has the lowest F1 score (0.478) due to limited training samples (1,552).

  3. Context Dependency: Short text snippets may lack sufficient context for accurate classification.

  4. Language: Primarily trained on English text content.

  5. Domain Bias: Training data skews toward literary and review content; performance may vary on social media or informal text.

Applications & Use Cases

  • Digital Humanities: Analyze literary corpora for thematic patterns and evolution of content standards
  • Library Science: Assist librarians in content cataloging and collection development
  • Literary Research: Support scholars studying censorship, publishing history, and textual analysis
  • Educational Technology: Help educators assess age-appropriateness of reading materials
  • Publishing: Aid editors and publishers in content classification and marketing decisions
  • Archive Management: Facilitate organization of historical texts and manuscripts

Ethical Considerations

  • Academic Purpose: This model is designed for scholarly analysis and educational applications
  • Human Review: Automated classifications should be reviewed by subject matter experts
  • Historical Context: Model may reflect contemporary biases when analyzing historical texts
  • Transparency: Classifications should be interpretable for academic and research contexts

License

This model is released under the Apache 2.0 license.

Citation

@misc{literary-content-classifier-2025,
  title={Literary Content Analysis: Multi-Class Classification of Textual Explicitness with DeBERTa},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/Mitchins/deberta-v3-small-literary-explicit-classifier}
}

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
18
Safetensors
Model size
142M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mitchins/deberta-v3-small-literary-explicit-classifier

Finetuned
(135)
this model

Evaluation results