Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,103 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
library_name: transformers
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- id
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
- f1
|
8 |
+
base_model:
|
9 |
+
- indobenchmark/indobert-base-p2
|
10 |
+
pipeline_tag: text-classification
|
11 |
library_name: transformers
|
12 |
+
tags:
|
13 |
+
- finance
|
14 |
+
- review
|
15 |
+
- produk
|
16 |
+
- ulasan
|
17 |
+
- e commerce
|
18 |
+
|
19 |
+
|
20 |
+
---
|
21 |
+
|
22 |
+
# IndoBERT klasifikasi Sentiment ID - 2 Kelas
|
23 |
+
|
24 |
+
Model ini merupakan hasil fine-tuning dari `indobenchmark/indobert-base-p2` untuk tugas klasifikasi sentimen review produk dalam Bahasa Indonesia dengan dua label (positif dan negatif). model di-pretrain terlebih dahulu menggunakan 78.536 baris ulasan produk agar lebih mengenali gaya bahasa informal yang umum digunakan dalam review, seperti singkatan "banget" β "bgt", "enggak" β "gk", dsb. kemudian model di finetune dengan Dataset yang terdiri dari 19.728 ulasan produk di e-commerce tokopedia dan juga berasal dari ulasan sintesis menggunakan LLM Gemini.
|
25 |
+
|
26 |
+
## π― Tujuan
|
27 |
+
Model ini ditujukan untuk mengklasifikasikan komentar/review produk/ulasan menjadi dua kelas sentimen: `POSITIF` dan `NEGATIF`.
|
28 |
+
|
29 |
+
## π§ͺ Metrics
|
30 |
+
Model ini dievaluasi menggunakan metrik:
|
31 |
+
- Accuracy
|
32 |
+
- F1 Score
|
33 |
+
|
34 |
+
## π§ Base Model
|
35 |
+
- [`indobenchmark/indobert-base-p2`](https://huggingface.co/indobenchmark/indobert-base-p2)
|
36 |
+
|
37 |
+
## βοΈ Training Arguments
|
38 |
+
Berikut adalah konfigurasi training (`TrainingArguments`) yang digunakan saat pelatihan:
|
39 |
+
```python
|
40 |
+
from transformers import TrainingArguments
|
41 |
+
|
42 |
+
training_args = TrainingArguments(
|
43 |
+
output_dir="./results",
|
44 |
+
num_train_epochs=3,
|
45 |
+
per_device_train_batch_size=4,
|
46 |
+
gradient_accumulation_steps=2,
|
47 |
+
per_device_eval_batch_size=4,
|
48 |
+
weight_decay=0.05,
|
49 |
+
eval_strategy="epoch",
|
50 |
+
save_strategy="epoch",
|
51 |
+
seed=42,
|
52 |
+
load_best_model_at_end=True,
|
53 |
+
metric_for_best_model="f1",
|
54 |
+
logging_dir="./logs",
|
55 |
+
report_to="tensorboard",
|
56 |
+
logging_steps=100,
|
57 |
+
warmup_ratio=0.05,
|
58 |
+
)
|
59 |
+
```
|
60 |
+
|
61 |
+
## π Hasil Evaluasi
|
62 |
+
|
63 |
+
Model dievaluasi selama 3 epoch menggunakan metrik Accuracy dan F1 Score. Berikut adalah performa model pada setiap epoch:
|
64 |
+
|
65 |
+
| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
|
66 |
+
|-------|---------------|-----------------|----------|----------|
|
67 |
+
| 1 | 0.2670 | 0.2374 | 0.9564 | 0.9564 |
|
68 |
+
| 2 | 0.4904 | 0.2951 | 0.9356 | 0.9356 |
|
69 |
+
| 3 | 0.3650 | 0.2176 | 0.9442 | 0.9441 |
|
70 |
+
|
71 |
+
|
72 |
+
|
73 |
+
## π¨βπ» Cara Inferensi
|
74 |
+
``` python
|
75 |
+
# Load model dan tokenizer
|
76 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
|
77 |
+
import torch
|
78 |
+
|
79 |
+
# Load dari Hugging Face Hub
|
80 |
+
model = AutoModelForSequenceClassification.from_pretrained("siRendy/indobert-analisis-sentimen-review-produk-datathon2025")
|
81 |
+
tokenizer = AutoTokenizer.from_pretrained("siRendy/indobert-analisis-sentimen-review-produk-datathon2025")
|
82 |
+
|
83 |
+
# Fungsi prediksi
|
84 |
+
def predict_sentiment(text):
|
85 |
+
classifier = pipeline(
|
86 |
+
"text-classification",
|
87 |
+
model=model,
|
88 |
+
tokenizer=tokenizer,
|
89 |
+
device=0 if torch.cuda.is_available() else -1
|
90 |
+
)
|
91 |
+
|
92 |
+
result = classifier(text)[0]
|
93 |
+
return {
|
94 |
+
"sentiment": str(result["label"]),
|
95 |
+
"confidence": round(result["score"], 4)
|
96 |
+
}
|
97 |
+
|
98 |
+
# Contoh penggunaan
|
99 |
+
text = "aih jelek nya ngawur iki opo seh, mo nipu lu ya"
|
100 |
+
prediction = predict_sentiment(text)
|
101 |
+
print(prediction)
|
102 |
+
|
103 |
+
```
|