--- language: ko license: apache-2.0 tags: - text2text-generation - korean - politeness - typo-correction --- # Finetuned ET5 for Politeness and Typo Correction This model is a fine-tuned version of `j5ng/et5-typos-corrector` for politeness enhancement and typo correction in Korean text. It transforms informal or typo-laden sentences into polite, grammatically correct ones. ## Dataset - **Source**: Custom dataset (`last_dataset_v2.jsonl`) - **Size**: ~300 examples - **Task**: Converts informal/erroneous Korean sentences to polite and correct ones. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("kimseongsan/finetuned-et5-politeness") model = AutoModelForSeq2SeqLM.from_pretrained("kimseongsan/finetuned-et5-politeness") input_text = "공손화: 왜 이거 또 틀렸어요?좀" inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True) outputs = model.generate(**inputs, max_length=64) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # Output: "왜 이것을 또 틀리셨나요? 조금 더 주의해 주시면 좋겠습니다." ``` ## Training - **Base Model**: `j5ng/et5-typos-corrector` - **Training Args**: - Learning Rate: 2e-5 - Epochs: 5 - Batch Size: 8 - Optimizer: AdamW - **Hardware**: GPU (e.g., NVIDIA T4) ## Limitations - Small dataset size may lead to overfitting. - Limited to educational context (e.g., "쌤", "숙제"). Generalization to other domains may require additional data. ## License Apache 2.0