Model Card for Sentence Type Classification
This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon jhgan/ko-sroberta-multitask
, a multilingual transformer model specialized for Korean NLP tasks.
Model Details
Model Description
- Developed by: Kwon Cho
- Shared by: kwoncho
- Model type: RoBERTa-based transformer (fine-tuned for sequence classification)
- Language(s): Korean (ํ๊ตญ์ด)
- License: Apache 2.0 (from base model)
- Finetuned from model:
jhgan/ko-sroberta-multitask
This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.
Model Sources
- Repository: [More Information Needed]
- Demo: [More Information Needed]
Uses
Direct Use
The model can be used to classify financial sentences (in Korean) into one of the following categories:
- Predictive (์์ธกํ)
- Inferential (์ถ๋ก ํ)
- Factual (์ฌ์คํ)
- Conversational (๋ํํ)
Training Data
- Dataset Name: ๋ฌธ์ฅ ์ ํ(์ถ๋ก , ์์ธก ๋ฑ) ํ๋จ ๋ฐ์ดํฐ
- ์ถ์ฒ: AIHub ๋งํฌ
์ด ๋ฐ์ดํฐ๋ ํ๊ตญ์ด ๊ธ์ต ๋ฌธ์ฅ์ ๋ค์ ๋ค ๊ฐ์ง ์ ํ์ผ๋ก ๋ถ๋ฅํฉ๋๋ค:
์์ธกํ (Predictive)
์ถ๋ก ํ (Inferential)
์ฌ์คํ (Factual)
๋ํํ (Conversational)
Out-of-Scope Use
- Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
- May not perform well on informal or highly colloquial text.
Bias, Risks, and Limitations
- The model may carry biases present in the training dataset.
- Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.
Recommendations
Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")
text = "ํด๋น ์ข
๋ชฉ์ ๋จ๊ธฐ์ ์ผ๋ก ํ๋ฝํ ๊ฐ๋ฅ์ฑ์ด ์์ต๋๋ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support