Finetuned RoBERTa Sentiment Model

Model Overview

This model is a version of the cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual model that has been fine-tuned on a custom dataset of YouTube comments. The fine-tuning process was designed to improve performance on sentiment analysis for YouTube comments, which often differ in tone, slang, and structure from other social media platforms. After fine-tuning, the model achieved an accuracy of 80.17%.

Intended Use

The model is designed for sentiment analysis of YouTube comments. It accepts a list of text inputs (comments) and returns a sentiment label for each comment:

  • Positive
  • Neutral
  • Negative

This model can be used in applications such as video recommendation systems, content analysis dashboards, and other data analysis tasks where understanding audience sentiment is important.

How It Was Trained

  • Base Model: cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual
  • Dataset: A custom dataset consisting of over 1 million YouTube comments, each annotated with one of three sentiment labels (Positive, Neutral, Negative).
  • Fine-Tuning Process: The model was fine-tuned using three Jupyter notebooks:
    • Data Cleaning and Preprocessing
    • Model Fine-Tuning
    • Evaluation and Testing

Evaluation

The model was evaluated on a held-out test set of YouTube comments. It improved from a baseline accuracy of approximately 69.3% (when fine-tuned on Twitter data) to 80.17% on this dataset. This improvement demonstrates the benefit of domain-specific fine-tuning.

Limitations

  • Domain Specificity: The model is fine-tuned on YouTube comments. Its performance on other types of text (e.g., tweets, reviews) may not be optimal.
  • Size: At around 1.2 GB, the model might be challenging to deploy on environments with limited memory.
  • Biases: As with any sentiment model, there might be biases based on the training data. Users are encouraged to evaluate the model in their specific use case and report any issues.

How to Use

The model can be used via an API endpoint or loaded locally using the Hugging Face Transformers library. For example, using Python:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example input
comments = [
    "This video was amazing!",
    "I didn't like the content.",
    "It was just okay."
]

inputs = tokenizer(comments, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=1)
label_mapping = {0: "Negative", 1: "Neutral", 2: "Positive"}
sentiments = [label_mapping[p.item()] for p in predictions]
print(sentiments)

Citation

If you use this model in your research, please cite the original base model and this project:

@misc{cardiffnlp,
  title={Twitter-XLM-RoBERTa-Base-Sentiment-Multilingual},
  author={Cardiff NLP},
  year={2020},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual}}
}

@misc{your_project,
  title={Finetuned RoBERTa Sentiment Model for YouTube Comments},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual}}
}

Downloads last month
52
Safetensors
Model size
278M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual

Space using AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual 1