Model Card for Sentence Type Classification

This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon jhgan/ko-sroberta-multitask, a multilingual transformer model specialized for Korean NLP tasks.

Model Details

Model Description

  • Developed by: Kwon Cho
  • Shared by: kwoncho
  • Model type: RoBERTa-based transformer (fine-tuned for sequence classification)
  • Language(s): Korean (ํ•œ๊ตญ์–ด)
  • License: Apache 2.0 (from base model)
  • Finetuned from model: jhgan/ko-sroberta-multitask

This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.

Model Sources

  • Repository: [More Information Needed]
  • Demo: [More Information Needed]

Uses

Direct Use

The model can be used to classify financial sentences (in Korean) into one of the following categories:

  • Predictive (์˜ˆ์ธกํ˜•)
  • Inferential (์ถ”๋ก ํ˜•)
  • Factual (์‚ฌ์‹คํ˜•)
  • Conversational (๋Œ€ํ™”ํ˜•)

Training Data

  • Dataset Name: ๋ฌธ์žฅ ์œ ํ˜•(์ถ”๋ก , ์˜ˆ์ธก ๋“ฑ) ํŒ๋‹จ ๋ฐ์ดํ„ฐ
  • ์ถœ์ฒ˜: AIHub ๋งํฌ

์ด ๋ฐ์ดํ„ฐ๋Š” ํ•œ๊ตญ์–ด ๊ธˆ์œต ๋ฌธ์žฅ์„ ๋‹ค์Œ ๋„ค ๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค:

  • ์˜ˆ์ธกํ˜• (Predictive)
  • ์ถ”๋ก ํ˜• (Inferential)
  • ์‚ฌ์‹คํ˜• (Factual)
  • ๋Œ€ํ™”ํ˜• (Conversational)

Out-of-Scope Use

  • Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
  • May not perform well on informal or highly colloquial text.

Bias, Risks, and Limitations

  • The model may carry biases present in the training dataset.
  • Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.

Recommendations

Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")

text = "ํ•ด๋‹น ์ข…๋ชฉ์€ ๋‹จ๊ธฐ์ ์œผ๋กœ ํ•˜๋ฝํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
Downloads last month
4
Safetensors
Model size
111M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support