--- library_name: transformers license: cc-by-4.0 base_model: paust/pko-t5-base tags: - generated_from_trainer model-index: - name: correction results: [] --- # Basic Inference ```python from transformers import T5TokenizerFast, T5ForConditionalGeneration tokenizer = T5TokenizerFast.from_pretrained('ij5/whitespace-correction') model = T5ForConditionalGeneration.from_pretrained('ij5/whitespace-correction') def fix_whitespace(text): inputs = f"띄어쓰기 교정: {text}" tokenized = tokenizer(inputs, max_length=128, truncation=True, return_tensors='pt').to('cuda') output_ids = model.generate( input_ids=tokenized['input_ids'], attention_mask=tokenized['attention_mask'], max_length=128, ) return tokenizer.decode(output_ids[0], skip_special_tokens=True) print(fix_whitespace("흔들 리는 가지 사이로 불쑥 바람의 형상 이 드 러나기라도 할 것처럼.")) # result: 흔들리는 가지 사이로 불쑥 바람의 형상이 드러나기라도 할 것처럼. ``` # correction This model is a fine-tuned version of [paust/pko-t5-base](https://huggingface.co/paust/pko-t5-base) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0160 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.0243 | 1.0 | 1688 | 0.0183 | | 0.0172 | 2.0 | 3376 | 0.0165 | | 0.0126 | 3.0 | 5064 | 0.0160 | ### Framework versions - Transformers 4.49.0 - Pytorch 2.6.0+cu124 - Datasets 3.3.2 - Tokenizers 0.21.0