grammar-synthesis: flan-t5-xl

Open In Colab

This model is a fine-tuned version of google/flan-t5-xl on an extended version of the JFLEG dataset.


ex

Model description

The intent is to create a text2text language model that successfully performs "single-shot grammar correction" on a potentially grammatically incorrect text that could have many errors with the important qualifier that it does not semantically change text/information that IS grammatically correct..

Compare some of the more severe error examples on other grammar correction models to see the difference :)

Limitations

  • Data set: cc-by-nc-sa-4.0
  • Model: apache-2.0
  • currently a work in progress! While probably useful for "single-shot grammar correction" in many cases, check the output for correctness, ok?.

Training procedure

Training hyperparameters

Session One

  • TODO: add this. It was a single epoch at higher LR

Session Two

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.02
  • num_epochs: 2.0
Downloads last month
60
Safetensors
Model size
2.92B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/flan-t5-xl-grammar-synthesis

Base model

google/flan-t5-xl
Quantized
(3)
this model

Dataset used to train pszemraj/flan-t5-xl-grammar-synthesis

Space using pszemraj/flan-t5-xl-grammar-synthesis 1

Collection including pszemraj/flan-t5-xl-grammar-synthesis