seongyeon1
/

klue-base-finetuned-nsmc

Text Classification

Inference Endpoints

Model card Files Files and versions Community

klue-base-finetuned-nsmc / README.md

seongyeon1's picture

Update README.md

61f1c49 verified 4 months ago

|

history blame contribute delete

1.94 kB

	---
	datasets:
	- e9t/nsmc
	language:
	- ko
	metrics:
	- accuracy
	pipeline_tag: text-classification
	---
	## Model Description

	- Finetuned from model klue/bert : (https://huggingface.co/klue/bert-base)
	- i got test_accuracy: 0.9041

	## Uses

	- use to sentimental analysis task

	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
	model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
	```

	```python
	from transformers import pipeline

	pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc")
	pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}]
	pipe("굿굿") # [{'label': 'LABEL_1', 'score': 0.9875587224960327}]

	```

	## Training Details

	### Training Data

	- nsmc datasets (https://huggingface.co/datasets/e9t/nsmc)
	```python
	from datasets import load_dataset

	dataset = load_dataset('nsmc')
	```

	#### Preprocessing

	- bert's default is 512, but it costs a lot of time.
	- maxlen = 55
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/t7axSlo4JI4bPLynUB3OP.png)

	```python
	def tokenize_function_with_max(examples, maxlen=maxlen):
	encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length')
	return encodings
	```

	#### Training Hyperparameters

	- learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2

	#### Metrics

	- accuracy
	- label ratio is about almost balanced

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/_S5TTyec8I25Kx-yaqeJo.png)

	#### Result

	{'eval_loss': 0.2575262784957886,
	'eval_accuracy': 0.9041,
	'eval_runtime': 163.2129,
	'eval_samples_per_second': 306.348,
	'eval_steps_per_second': 9.576,
	'epoch': 2.0}