Finetuned ET5 for Politeness and Typo Correction

This model is a fine-tuned version of j5ng/et5-typos-corrector for politeness enhancement and typo correction in Korean text. It transforms informal or typo-laden sentences into polite, grammatically correct ones.

Dataset

  • Source: Custom dataset (last_dataset_v2.jsonl)
  • Size: ~300 examples
  • Task: Converts informal/erroneous Korean sentences to polite and correct ones.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("kimseongsan/finetuned-et5-politeness")
model = AutoModelForSeq2SeqLM.from_pretrained("kimseongsan/finetuned-et5-politeness")

input_text = "๊ณต์†ํ™”: ์™œ ์ด๊ฑฐ ๋˜ ํ‹€๋ ธ์–ด์š”?์ข€"
inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True)
outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "์™œ ์ด๊ฒƒ์„ ๋˜ ํ‹€๋ฆฌ์…จ๋‚˜์š”? ์กฐ๊ธˆ ๋” ์ฃผ์˜ํ•ด ์ฃผ์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค."

Training

  • Base Model: j5ng/et5-typos-corrector
  • Training Args:
    • Learning Rate: 2e-5
    • Epochs: 5
    • Batch Size: 8
    • Optimizer: AdamW
  • Hardware: GPU (e.g., NVIDIA T4)

Limitations

  • Small dataset size may lead to overfitting.
  • Limited to educational context (e.g., "์Œค", "์ˆ™์ œ"). Generalization to other domains may require additional data.

License

Apache 2.0

Downloads last month
14
Safetensors
Model size
324M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support