suil0109 commited on
Commit
28d1cac
ยท
verified ยท
1 Parent(s): 52f0e49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -3
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Qwen3-8B-Korean-Sentiment
6
+
7
+ ## Overview
8
+
9
+ This repository contains a fine-tuned model for **Korean Sentiment Analysis (ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„์„)** using a **Large Language Model (LLM)**, specifically designed for **YouTube comments** in **Korean**. The model classifies sentiments into **Positive**, **Negative**, and **Neutral** categories, and is fine-tuned to detect not only direct emotions but also subtle features like **irony (๋ฐ˜์–ด๋ฒ•)** and **sarcasm (ํ’์ž)** common in Korean-language content.
10
+
11
+ ### Sentiment Classification:
12
+ - **Positive (๊ธ์ •)**
13
+ - **Negative (๋ถ€์ •)**
14
+ - **Neutral (์ค‘๋ฆฝ)**
15
+
16
+ ## Quickstart
17
+
18
+ To quickly get started with the fine-tuned model, use the following Python code:
19
+
20
+ ```python
21
+ from peft import AutoPeftModelForCausalLM
22
+ from transformers import AutoTokenizer
23
+
24
+ # Load the model and tokenizer
25
+ model = AutoPeftModelForCausalLM.from_pretrained("suil0109/Qwen3-8B-Korean-Sentiment").to("cuda")
26
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
27
+
28
+ model.eval()
29
+
30
+ # Sample comment
31
+ comment = "์ด๊ฑฐ ๋„ˆ๋ฌด ์ข‹์•„์š”!"
32
+
33
+ # Format the prompt
34
+ prompt = (
35
+ "๋‹ค์Œ์€ ์œ ํŠœ๋ธŒ ๋Œ“๊ธ€์ž…๋‹ˆ๋‹ค. ๋Œ“๊ธ€์˜ ๊ฐ์ •์„ ๋ถ„๋ฅ˜ํ•ด ์ฃผ์„ธ์š”.\n\n"
36
+ f"๋Œ“๊ธ€: {comment}\n\n"
37
+ "๋ฐ˜๋“œ์‹œ ๋‹ค์Œ ์ค‘ ํ•˜๋‚˜๋งŒ ์ถœ๋ ฅํ•˜์„ธ์š”: ๊ธ์ • / ๋ถ€์ • / ์ค‘๋ฆฝ"
38
+ )
39
+
40
+ # Tokenize the input
41
+ inputs = tokenizer(prompt, return_tensors="pt")
42
+
43
+ # Generate prediction
44
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=512)
45
+
46
+ # Decode and print the output
47
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
48
+ ```
49
+
50
+ ## Train/Test Details
51
+
52
+ - **Training Dataset**: Fine-tuned on **3,857** labeled YouTube comments for sentiment classification.
53
+ - **Testing Dataset**: Evaluated on **1,130** labeled YouTube comments to assess the model's performance.
54
+
55
+ ## Results
56
+
57
+ The fine-tuned model's performance on the sentiment classification task is summarized below:
58
+
59
+ | Metric | Positive (๊ธ์ •) | Neutral (์ค‘๋ฆฝ) | Negative (๋ถ€์ •) |
60
+ |--------------|-----------------|----------------|-----------------|
61
+ | **Precision**| 0.8981 | 0.3787 | 0.4971 |
62
+ | **Recall** | 0.7362 | 0.2880 | 0.7413 |
63
+ | **F1-Score** | 0.8092 | 0.3272 | 0.5951 |
64
+ | **Support** | 527 | 309 | 344 |
65
+
66
+ **Accuracy**: 62.03% (Based on 1180 samples)
67
+
68
+ You can find detailed results [here](https://github.com/suil0109/LLM-SocialMedia/tree/main/huggingface).
69
+
70
+ ## Contact
71
+
72
+ For any inquiries or feedback, feel free to contact the team:
73
+
74
+ - **Email**: [email protected]
75
+
76
+ **Team**:
77
+ - Hanjun Jung
78
+ - Jinsoo Kim
79
+ - Junhyeok Choi
80
+ - Suil Lee
81
+ - Seongjae Kang