MeowML/ToxicBERT - Turkish Toxic Language Detection

Model Description

ToxicBERT is a fine-tuned BERT model specifically designed for detecting toxic language in Turkish text. Built upon the dbmdz/bert-base-turkish-cased foundation model, this classifier can identify potentially harmful, offensive, or toxic content in Turkish social media posts, comments, and general text.

Model Details

  • Model Type: Text Classification (Binary)
  • Language: Turkish (tr)
  • Base Model: dbmdz/bert-base-turkish-cased
  • License: MIT
  • Library: Transformers
  • Task: Toxicity Detection

Intended Use

Primary Use Cases

  • Content moderation for Turkish social media platforms
  • Automated filtering of user-generated content
  • Research in Turkish NLP and toxicity detection
  • Educational purposes for understanding toxic language patterns

Out-of-Scope Use

  • This model should not be used as the sole decision-maker for content moderation without human oversight
  • Not suitable for languages other than Turkish
  • Should not be used for sensitive applications without proper validation and testing

Training Data

The model was trained on the Overfit-GM/turkish-toxic-language dataset, which contains Turkish text samples labeled for toxicity. The dataset includes various forms of toxic content commonly found in online Turkish communications.

Model Performance

The model outputs:

  • Binary Classification: 0 (Non-toxic) or 1 (Toxic)
  • Confidence Score: Probability score indicating model confidence
  • Toxic Probability: Specific probability of the text being toxic

Usage

Quick Start

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
    model = AutoModelForSequenceClassification.from_pretrained("MeowML/ToxicBERT")

    # Prepare text
    text = "Merhaba, nasılsın?"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        prediction = torch.argmax(probabilities, dim=-1)
        
    toxic_probability = probabilities[0][1].item()
    is_toxic = bool(prediction.item())

    print(f"Is toxic: {is_toxic}")
    print(f"Toxic probability: {toxic_probability:.4f}")

Advanced Usage with Custom Class

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    class ToxicLanguageDetector:
        def __init__(self, model_name="MeowML/ToxicBERT"):
            self.tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
            self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
            self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            self.model.to(self.device)
            self.model.eval()
            
        def predict(self, text):
            inputs = self.tokenizer(
                text,
                truncation=True,
                padding='max_length',
                max_length=256,
                return_tensors='pt'
            ).to(self.device)
            
            with torch.no_grad():
                outputs = self.model(**inputs)
                probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
                prediction = torch.argmax(probabilities, dim=-1)
            
            return {
                'text': text,
                'is_toxic': bool(prediction.item()),
                'toxic_probability': probabilities[0][1].item(),
                'confidence': max(probabilities[0]).item()
            }

    # Usage
    detector = ToxicLanguageDetector()
    result = detector.predict("Merhaba, nasılsın?")
    print(result)

Limitations and Biases

Limitations

  • The model's performance depends heavily on the training data quality and coverage
  • May have difficulty with context-dependent toxicity (sarcasm, irony)
  • Performance may vary across different Turkish dialects or informal language
  • Shorter texts might be more challenging to classify accurately

Potential Biases

  • The model may reflect biases present in the training dataset
  • Certain topics, demographics, or linguistic patterns might be over- or under-represented
  • Regular evaluation and bias testing are recommended for production use

Ethical Considerations

  • This model should be used responsibly with human oversight
  • False positives and negatives are expected and should be accounted for
  • Consider the impact on freedom of expression when implementing automated moderation
  • Regular auditing and updating are recommended to maintain fairness

Technical Specifications

  • Input: Text strings (max 256 tokens)
  • Output: Binary classification with probability scores
  • Model Size: Based on BERT-base architecture
  • Inference Speed: Optimized for both CPU and GPU inference
  • Memory Requirements: Suitable for standard hardware configurations

Citation

If you use this model in your research or applications, please cite:

    @misc{meowml_toxicbert_2024,
      title={ToxicBERT: Turkish Toxic Language Detection},
      author={MeowML},
      year={2024},
      publisher={Hugging Face},
      url={https://huggingface.co/MeowML/ToxicBERT}
    }

Acknowledgments

  • Base model: dbmdz/bert-base-turkish-cased
  • Training dataset: Overfit-GM/turkish-toxic-language
  • Built with Hugging Face Transformers library

Contact

For questions, issues, or suggestions, please open an issue in the model repository or contact the MeowML team.


Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use in their applications.

Downloads last month
5
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MeowML/ToxicBERT

Finetuned
(184)
this model

Dataset used to train MeowML/ToxicBERT