Model Card for Sentence Type Classification

This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon jhgan/ko-sroberta-multitask, a multilingual transformer model specialized for Korean NLP tasks.

Model Details

Model Description

Developed by: Kwon Cho
Shared by: kwoncho
Model type: RoBERTa-based transformer (fine-tuned for sequence classification)
Language(s): Korean (한국어)
License: Apache 2.0 (from base model)
Finetuned from model: jhgan/ko-sroberta-multitask

This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.

Model Sources

Repository: [More Information Needed]
Demo: [More Information Needed]

Uses

Direct Use

The model can be used to classify financial sentences (in Korean) into one of the following categories:

Predictive (예측형)
Inferential (추론형)
Factual (사실형)
Conversational (대화형)

Training Data

Dataset Name: 문장 유형(추론, 예측 등) 판단 데이터
출처: AIHub 링크

이 데이터는 한국어 금융 문장을 다음 네 가지 유형으로 분류합니다:

예측형 (Predictive)
추론형 (Inferential)
사실형 (Factual)
대화형 (Conversational)

Out-of-Scope Use

Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
May not perform well on informal or highly colloquial text.

Bias, Risks, and Limitations

The model may carry biases present in the training dataset.
Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.

Recommendations

Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")

text = "해당 종목은 단기적으로 하락할 가능성이 있습니다."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)