IndoBERT Sentiment Analysis

Model ini merupakan hasil fine-tuning dari indobenchmark/indobert-base-p1 untuk tugas klasifikasi sentimen dalam bahasa Indonesia.

✨ Dataset

Scrapping Twitter/X terkumpul sebanyak 15.027 tweet

✨ Proses Preprocessing

  • Hapus Duplikat
  • Cleaning Data
  • Case Folding
  • Normalisasi Kata

✨ Indonesia Sentimen Lexicon

by: Fajri Koto(GitHub @fajri91)

  • Label Sentimen: Positive, Negative, Neutral
  • Positive.tsv: 3610 kata positive
  • Negative.tsv: 6608 kata negative

✨ Split Dataset

  • Train : 80%
  • Val : 10%
  • Test : 10%

✨ Training Configuration Indobert

  • set_seed : 42
  • Model : indobenchmark/indobert-base-p1
  • Max Seq Length: 256
  • Batch Size : 32
  • Num_workers : 2
  • Optimizer : Adam
  • Learning Rate : 2e-5
  • Weigth_decay : 0.02
  • Epochs : 5

Framework Versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Tokenizers 0.21.1
Downloads last month
102
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sahron/sentiment-indobert1aa_model

Finetuned
(77)
this model