Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,81 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
# Qwen3-8B-Korean-Sentiment
|
6 |
+
|
7 |
+
## Overview
|
8 |
+
|
9 |
+
This repository contains a fine-tuned model for **Korean Sentiment Analysis (ํ๊ตญ์ด ๊ฐ์ ๋ถ์)** using a **Large Language Model (LLM)**, specifically designed for **YouTube comments** in **Korean**. The model classifies sentiments into **Positive**, **Negative**, and **Neutral** categories, and is fine-tuned to detect not only direct emotions but also subtle features like **irony (๋ฐ์ด๋ฒ)** and **sarcasm (ํ์)** common in Korean-language content.
|
10 |
+
|
11 |
+
### Sentiment Classification:
|
12 |
+
- **Positive (๊ธ์ )**
|
13 |
+
- **Negative (๋ถ์ )**
|
14 |
+
- **Neutral (์ค๋ฆฝ)**
|
15 |
+
|
16 |
+
## Quickstart
|
17 |
+
|
18 |
+
To quickly get started with the fine-tuned model, use the following Python code:
|
19 |
+
|
20 |
+
```python
|
21 |
+
from peft import AutoPeftModelForCausalLM
|
22 |
+
from transformers import AutoTokenizer
|
23 |
+
|
24 |
+
# Load the model and tokenizer
|
25 |
+
model = AutoPeftModelForCausalLM.from_pretrained("suil0109/Qwen3-8B-Korean-Sentiment").to("cuda")
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
|
27 |
+
|
28 |
+
model.eval()
|
29 |
+
|
30 |
+
# Sample comment
|
31 |
+
comment = "์ด๊ฑฐ ๋๋ฌด ์ข์์!"
|
32 |
+
|
33 |
+
# Format the prompt
|
34 |
+
prompt = (
|
35 |
+
"๋ค์์ ์ ํ๋ธ ๋๊ธ์
๋๋ค. ๋๊ธ์ ๊ฐ์ ์ ๋ถ๋ฅํด ์ฃผ์ธ์.\n\n"
|
36 |
+
f"๋๊ธ: {comment}\n\n"
|
37 |
+
"๋ฐ๋์ ๋ค์ ์ค ํ๋๋ง ์ถ๋ ฅํ์ธ์: ๊ธ์ / ๋ถ์ / ์ค๋ฆฝ"
|
38 |
+
)
|
39 |
+
|
40 |
+
# Tokenize the input
|
41 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
42 |
+
|
43 |
+
# Generate prediction
|
44 |
+
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=512)
|
45 |
+
|
46 |
+
# Decode and print the output
|
47 |
+
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
|
48 |
+
```
|
49 |
+
|
50 |
+
## Train/Test Details
|
51 |
+
|
52 |
+
- **Training Dataset**: Fine-tuned on **3,857** labeled YouTube comments for sentiment classification.
|
53 |
+
- **Testing Dataset**: Evaluated on **1,130** labeled YouTube comments to assess the model's performance.
|
54 |
+
|
55 |
+
## Results
|
56 |
+
|
57 |
+
The fine-tuned model's performance on the sentiment classification task is summarized below:
|
58 |
+
|
59 |
+
| Metric | Positive (๊ธ์ ) | Neutral (์ค๋ฆฝ) | Negative (๋ถ์ ) |
|
60 |
+
|--------------|-----------------|----------------|-----------------|
|
61 |
+
| **Precision**| 0.8981 | 0.3787 | 0.4971 |
|
62 |
+
| **Recall** | 0.7362 | 0.2880 | 0.7413 |
|
63 |
+
| **F1-Score** | 0.8092 | 0.3272 | 0.5951 |
|
64 |
+
| **Support** | 527 | 309 | 344 |
|
65 |
+
|
66 |
+
**Accuracy**: 62.03% (Based on 1180 samples)
|
67 |
+
|
68 |
+
You can find detailed results [here](https://github.com/suil0109/LLM-SocialMedia/tree/main/huggingface).
|
69 |
+
|
70 |
+
## Contact
|
71 |
+
|
72 |
+
For any inquiries or feedback, feel free to contact the team:
|
73 |
+
|
74 |
+
- **Email**: [email protected]
|
75 |
+
|
76 |
+
**Team**:
|
77 |
+
- Hanjun Jung
|
78 |
+
- Jinsoo Kim
|
79 |
+
- Junhyeok Choi
|
80 |
+
- Suil Lee
|
81 |
+
- Seongjae Kang
|