|
--- |
|
datasets: |
|
- e9t/nsmc |
|
language: |
|
- ko |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
--- |
|
## Model Description |
|
|
|
- **Finetuned from model klue/bert :** (https://huggingface.co/klue/bert-base) |
|
- i got **test_accuracy: 0.9041** |
|
|
|
## Uses |
|
|
|
- use to sentimental analysis task |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc") |
|
model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc") |
|
``` |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc") |
|
pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}] |
|
pipe("굿굿") # [{'label': 'LABEL_1', 'score': 0.9875587224960327}] |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
- nsmc datasets (https://huggingface.co/datasets/e9t/nsmc) |
|
```python |
|
from datasets import load_dataset |
|
|
|
dataset = load_dataset('nsmc') |
|
``` |
|
|
|
#### Preprocessing |
|
|
|
- bert's default is 512, but it costs a lot of time. |
|
- maxlen = 55 |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/t7axSlo4JI4bPLynUB3OP.png) |
|
|
|
```python |
|
def tokenize_function_with_max(examples, maxlen=maxlen): |
|
encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length') |
|
return encodings |
|
``` |
|
|
|
#### Training Hyperparameters |
|
|
|
- learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2 |
|
|
|
#### Metrics |
|
|
|
- **accuracy** |
|
- label ratio is about almost balanced |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/_S5TTyec8I25Kx-yaqeJo.png) |
|
|
|
#### Result |
|
|
|
{'eval_loss': 0.2575262784957886, |
|
'eval_accuracy': 0.9041, |
|
'eval_runtime': 163.2129, |
|
'eval_samples_per_second': 306.348, |
|
'eval_steps_per_second': 9.576, |
|
'epoch': 2.0} |
|
|