T5-Small Grammar Correction

A fine-tuned t5-small model for correcting grammar errors in English text. Given a sentence, the model generates a grammatically correct version using a text-to-text approach.

Model Details

Developed by: Harsha Vardhan N
Model type: Sequence-to-Sequence Transformer
Language(s): English
License: Apache 2.0
Finetuned from model: t5-small

Training Details

Training Data

The model was fine-tuned on the wiki_auto/auto_full_with_split dataset, a large-scale corpus designed for sentence-level grammatical and stylistic simplification. It contains aligned pairs of complex and simplified English sentences extracted from Wikipedia and Simple Wikipedia. For this task, the dataset was used to teach the model how to correct ungrammatical sentences into fluent and grammatically correct English.

Training Procedure

Epochs: 3
Training Duration: ~1 hour
Optimizer: AdamW (via Hugging Face Seq2SeqTrainer)
Learning Rate: 5e-5
Batch Size: 8
Environment: Google Colab GPU

Technical Specifications

Compute Infrastructure

Hardware

GPU: Google Colab-provided GPU (likely Tesla T4)

Software

Framework: Hugging Face Transformers, PyTorch
Trainer Used: Seq2SeqTrainer

Harshathemonster
/

t5-small-updated