KoBART Korean-to-Korean Sign Language Translation Model
This model is based on gogamza/kobart-base-v2 and has been fine-tuned as a Transformer-based Seq2Seq model to automatically convert Korean sentences into Korean Sign Language (KSL) grammatical structures.
Model description
- ํ๊ตญ์ด ๋ฌธ์ฅ์ ์ ๋ ฅ์ผ๋ก ๋ฐ์ ์์ด ๋ฌธ๋ฒ(SOV ๋ฑ)์ ๋ง์ถ ๋ณํ๋ ๋ฌธ์ฅ์ ์ถ๋ ฅ
- ์์ด ํต์ญ์ฌ ๋ถ์กฑ ๋ฌธ์ ๋ฅผ ๋ณด์ํ๊ณ , ๋์ธ์ ์ ๋ณด ์ ๊ทผ์ฑ์ ๋์ด๊ธฐ ์ํ ๊ธฐ์ ์ ์ ๊ทผ
- ํ ํฌ๋์ด์ ๋
KoBARTTokenizer
์ฌ์ฉ, ํน์ ํ ํฐ<s>
,</s>
,<pad>
ํฌํจ
Intended uses & limitations
Intended uses
- ์์ฑ ์ธ์ ๊ฒฐ๊ณผ(์: Whisper)๋ฅผ ์ ๋ ฅ์ผ๋ก ๋ฐ์ ์์ด ํํ ๋ฌธ์ฅ์ผ๋ก ๋ณํ
- ๋ด์ค, ์๋ด ๋ฐฉ์ก ๋ฑ ๋์ธ ๋์ ์ ๋ณด ์ ๋ฌ ์์คํ ์ ๋ฐฑ์๋ ์ฒ๋ฆฌ์ ์ฌ์ฉ ๊ฐ๋ฅ
Limitaions
- ํ๊ตญ์ด-์์ด ๋ณ๋ ฌ ๋ง๋ญ์น ๊ธฐ๋ฐ์ผ๋ก ํ๋ จ๋์์ผ๋ฉฐ, ๋๋ฉ์ธ ์ธ ๋ฌธ์ฅ์๋ ๋ถ์ ํํ ์ ์์
- ์์ด ์์์ ์์ฑํ๋ ๊ธฐ๋ฅ์ ํฌํจ๋์ด ์์ง ์์ (ํ ์คํธ ๋ณํ๊น์ง๋ง ์ฒ๋ฆฌ)
Dataset
- ์ถ์ฒ: ๊ตญ๋ฆฝ๊ตญ์ด์ ํ๊ตญ์ด-ํ๊ตญ์์ด ๋ณ๋ ฌ ๋ง๋ญ์น
- ํ์: TSV ํ์ผ๋ก ๋ณํํ์ฌ ์ฌ์ฉ (์ด ์ด๋ฆ:
koreanText
,sign_lang_sntenc
)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-5
- max_length: 128
- num_train_epochs: 3
- per_device_train_batch_size: 16
- gradient_accumulation_steps: 2
- warmup_steps: 500
- fp16: True
Example usage
'''from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("./") model = AutoModelForSeq2SeqLM.from_pretrained("./")
sentence = "์ค๋ ๋ ์จ ์ด๋?" inputs = tokenizer(sentence, return_tensors="pt") output = model.generate(**inputs, max_new_tokens=64) print(tokenizer.decode(output[0], skip_special_tokens=True))'''
Training results
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0
- Datasets 2.15.0
- Tokenizers 0.15.0
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chaem/kobart-ksl-translation
Base model
gogamza/kobart-base-v2