---
language: ko
license: apache-2.0
tags:
  - text2text-generation
  - korean
  - politeness
  - typo-correction
---

# Finetuned ET5 for Politeness and Typo Correction

This model is a fine-tuned version of `j5ng/et5-typos-corrector` for politeness enhancement and typo correction in Korean text. It transforms informal or typo-laden sentences into polite, grammatically correct ones.

## Dataset
- **Source**: Custom dataset (`last_dataset_v2.jsonl`)
- **Size**: ~300 examples
- **Task**: Converts informal/erroneous Korean sentences to polite and correct ones.

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("kimseongsan/finetuned-et5-politeness")
model = AutoModelForSeq2SeqLM.from_pretrained("kimseongsan/finetuned-et5-politeness")

input_text = "공손화: 왜 이거 또 틀렸어요?좀"
inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True)
outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "왜 이것을 또 틀리셨나요? 조금 더 주의해 주시면 좋겠습니다."
```

## Training
- **Base Model**: `j5ng/et5-typos-corrector`
- **Training Args**:
  - Learning Rate: 2e-5
  - Epochs: 5
  - Batch Size: 8
  - Optimizer: AdamW
- **Hardware**: GPU (e.g., NVIDIA T4)

## Limitations
- Small dataset size may lead to overfitting.
- Limited to educational context (e.g., "쌤", "숙제"). Generalization to other domains may require additional data.

## License
Apache 2.0