metadata

language: ko
license: apache-2.0
tags:
  - text2text-generation
  - korean
  - politeness
  - typo-correction

Finetuned ET5 for Politeness and Typo Correction

This model is a fine-tuned version of j5ng/et5-typos-corrector for politeness enhancement and typo correction in Korean text. It transforms informal or typo-laden sentences into polite, grammatically correct ones.

Dataset

Source: Custom dataset (last_dataset_v2.jsonl)
Size: ~300 examples
Task: Converts informal/erroneous Korean sentences to polite and correct ones.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("kimseongsan/finetuned-et5-politeness")
model = AutoModelForSeq2SeqLM.from_pretrained("kimseongsan/finetuned-et5-politeness")

input_text = "공손화: 왜 이거 또 틀렸어요?좀"
inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True)
outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "왜 이것을 또 틀리셨나요? 조금 더 주의해 주시면 좋겠습니다."

Training

Base Model: j5ng/et5-typos-corrector
Training Args:
- Learning Rate: 2e-5
- Epochs: 5
- Batch Size: 8
- Optimizer: AdamW
Hardware: GPU (e.g., NVIDIA T4)

Limitations

Small dataset size may lead to overfitting.
Limited to educational context (e.g., "쌤", "숙제"). Generalization to other domains may require additional data.

License

Apache 2.0