KoBART Korean-to-Korean Sign Language Translation Model

This model is based on gogamza/kobart-base-v2 and has been fine-tuned as a Transformer-based Seq2Seq model to automatically convert Korean sentences into Korean Sign Language (KSL) grammatical structures.

Model description

  • ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ˆ˜์–ด ๋ฌธ๋ฒ•(SOV ๋“ฑ)์— ๋งž์ถ˜ ๋ณ€ํ™˜๋œ ๋ฌธ์žฅ์„ ์ถœ๋ ฅ
  • ์ˆ˜์–ด ํ†ต์—ญ์‚ฌ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ๋ณด์™„ํ•˜๊ณ , ๋†์ธ์˜ ์ •๋ณด ์ ‘๊ทผ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ ์  ์ ‘๊ทผ
  • ํ† ํฌ๋‚˜์ด์ €๋Š” KoBARTTokenizer ์‚ฌ์šฉ, ํŠน์ˆ˜ ํ† ํฐ <s>, </s>, <pad> ํฌํ•จ

Intended uses & limitations

Intended uses

  • ์Œ์„ฑ ์ธ์‹ ๊ฒฐ๊ณผ(์˜ˆ: Whisper)๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ˆ˜์–ด ํ˜•ํƒœ ๋ฌธ์žฅ์œผ๋กœ ๋ณ€ํ™˜
  • ๋‰ด์Šค, ์•ˆ๋‚ด ๋ฐฉ์†ก ๋“ฑ ๋†์ธ ๋Œ€์ƒ ์ •๋ณด ์ „๋‹ฌ ์‹œ์Šคํ…œ์˜ ๋ฐฑ์—”๋“œ ์ฒ˜๋ฆฌ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅ

Limitaions

  • ํ•œ๊ตญ์–ด-์ˆ˜์–ด ๋ณ‘๋ ฌ ๋ง๋ญ‰์น˜ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›ˆ๋ จ๋˜์—ˆ์œผ๋ฉฐ, ๋„๋ฉ”์ธ ์™ธ ๋ฌธ์žฅ์—๋Š” ๋ถ€์ •ํ™•ํ•  ์ˆ˜ ์žˆ์Œ
  • ์ˆ˜์–ด ์˜์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ๋Šฅ์€ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์Œ (ํ…์ŠคํŠธ ๋ณ€ํ™˜๊นŒ์ง€๋งŒ ์ฒ˜๋ฆฌ)

Dataset

  • ์ถœ์ฒ˜: ๊ตญ๋ฆฝ๊ตญ์–ด์› ํ•œ๊ตญ์–ด-ํ•œ๊ตญ์ˆ˜์–ด ๋ณ‘๋ ฌ ๋ง๋ญ‰์น˜
  • ํ˜•์‹: TSV ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉ (์—ด ์ด๋ฆ„: koreanText, sign_lang_sntenc)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-5
  • max_length: 128
  • num_train_epochs: 3
  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 2
  • warmup_steps: 500
  • fp16: True

Example usage

'''from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("./") model = AutoModelForSeq2SeqLM.from_pretrained("./")

sentence = "์˜ค๋Š˜ ๋‚ ์”จ ์–ด๋•Œ?" inputs = tokenizer(sentence, return_tensors="pt") output = model.generate(**inputs, max_new_tokens=64) print(tokenizer.decode(output[0], skip_special_tokens=True))'''

Training results

image/png

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
6
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chaem/kobart-ksl-translation

Finetuned
(21)
this model