pakawadeep commited on
Commit
67309be
·
verified ·
1 Parent(s): 8c065b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -52
README.md CHANGED
@@ -2,79 +2,97 @@
2
  license: apache-2.0
3
  base_model: google/mt5-large
4
  tags:
 
 
 
 
 
5
  - generated_from_keras_callback
6
  model-index:
7
- - name: pakawadeep/mt5-large-finetuned-ctfl-augmented_1
8
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
12
- probably proofread and complete it, then remove this comment. -->
13
 
14
- # pakawadeep/mt5-large-finetuned-ctfl-augmented_1
15
-
16
- This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on an unknown dataset.
17
- It achieves the following results on the evaluation set:
18
- - Train Loss: 0.2041
19
- - Validation Loss: 0.7119
20
- - Train Rouge1: 8.6634
21
- - Train Rouge2: 0.6931
22
- - Train Rougel: 8.5691
23
- - Train Rougelsum: 8.6987
24
- - Train Gen Len: 11.9158
25
- - Epoch: 21
26
 
27
  ## Model description
28
 
29
- More information needed
 
 
30
 
31
  ## Intended uses & limitations
32
 
33
- More information needed
 
 
 
 
 
 
 
 
34
 
35
  ## Training and evaluation data
36
 
37
- More information needed
 
 
 
 
38
 
39
  ## Training procedure
40
 
41
  ### Training hyperparameters
42
-
43
- The following hyperparameters were used during training:
44
- - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
45
- - training_precision: float32
46
-
47
- ### Training results
48
-
49
- | Train Loss | Validation Loss | Train Rouge1 | Train Rouge2 | Train Rougel | Train Rougelsum | Train Gen Len | Epoch |
50
- |:----------:|:---------------:|:------------:|:------------:|:------------:|:---------------:|:-------------:|:-----:|
51
- | 3.7859 | 1.7737 | 3.8966 | 1.1818 | 3.8139 | 3.8868 | 12.8069 | 0 |
52
- | 1.7728 | 1.2922 | 6.8010 | 1.1881 | 6.7657 | 6.7657 | 11.7376 | 1 |
53
- | 1.3356 | 1.0734 | 7.3020 | 1.8152 | 7.1782 | 7.3020 | 11.9010 | 2 |
54
- | 1.1070 | 0.9405 | 8.2037 | 2.1782 | 7.9915 | 8.2037 | 12.0198 | 3 |
55
- | 0.9583 | 0.8494 | 8.2037 | 2.1782 | 7.9915 | 8.2037 | 11.9901 | 4 |
56
- | 0.8463 | 0.7866 | 9.0288 | 2.4257 | 8.8873 | 8.9109 | 11.9802 | 5 |
57
- | 0.7662 | 0.7320 | 8.9816 | 2.3762 | 8.7694 | 8.8755 | 11.8960 | 6 |
58
- | 0.6961 | 0.7024 | 8.7341 | 1.8812 | 8.6457 | 8.6987 | 11.9010 | 7 |
59
- | 0.6444 | 0.6952 | 8.7341 | 1.8812 | 8.6457 | 8.6987 | 11.9406 | 8 |
60
- | 0.5881 | 0.6612 | 8.2862 | 0.7921 | 8.2390 | 8.2744 | 11.8960 | 9 |
61
- | 0.5386 | 0.6746 | 8.4689 | 1.3861 | 8.4335 | 8.4512 | 11.9307 | 10 |
62
- | 0.4944 | 0.6473 | 8.4689 | 1.3861 | 8.4335 | 8.4512 | 11.9406 | 11 |
63
- | 0.4524 | 0.6328 | 7.7793 | 0.7921 | 7.7027 | 7.7558 | 11.9307 | 12 |
64
- | 0.4161 | 0.6521 | 8.4689 | 1.3861 | 8.4335 | 8.4512 | 11.9307 | 13 |
65
- | 0.3812 | 0.6311 | 8.2862 | 0.7921 | 8.2390 | 8.2744 | 11.9109 | 14 |
66
- | 0.3488 | 0.6368 | 8.2862 | 0.7921 | 8.2390 | 8.2744 | 11.8960 | 15 |
67
- | 0.3181 | 0.6449 | 8.7812 | 0.7921 | 8.6987 | 8.7930 | 11.9455 | 16 |
68
- | 0.2898 | 0.6495 | 8.8461 | 0.8911 | 8.7400 | 8.8637 | 11.9307 | 17 |
69
- | 0.2677 | 0.6583 | 8.8461 | 0.8911 | 8.7400 | 8.8637 | 11.9059 | 18 |
70
- | 0.2435 | 0.6823 | 8.8461 | 0.8911 | 8.7400 | 8.8637 | 11.9653 | 19 |
71
- | 0.2227 | 0.6897 | 8.6575 | 0.6931 | 8.5337 | 8.6693 | 11.9703 | 20 |
72
- | 0.2041 | 0.7119 | 8.6634 | 0.6931 | 8.5691 | 8.6987 | 11.9158 | 21 |
73
-
74
 
75
  ### Framework versions
76
-
77
  - Transformers 4.41.2
78
  - TensorFlow 2.15.0
79
  - Datasets 2.20.0
80
  - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model: google/mt5-large
4
  tags:
5
+ - thai
6
+ - grammatical-error-correction
7
+ - mt5
8
+ - fine-tuned
9
+ - l2-learners
10
  - generated_from_keras_callback
11
  model-index:
12
+ - name: pakawadeep/ctfl-gec-th
13
+ results:
14
+ - task:
15
+ name: Grammatical Error Correction
16
+ type: text2text-generation
17
+ dataset:
18
+ name: CTFL-GEC (augmented with Self-Instruct 200%)
19
+ type: custom
20
+ metrics:
21
+ - name: Precision
22
+ type: precision
23
+ value: 0.47
24
+ - name: Recall
25
+ type: recall
26
+ value: 0.47
27
+ - name: F1
28
+ type: f1
29
+ value: 0.47
30
+ - name: F0.5
31
+ type: f0.5
32
+ value: 0.47
33
+ - name: BLEU
34
+ type: bleu
35
+ value: 0.69
36
+ - name: GLEU
37
+ type: gleu
38
+ value: 0.68
39
+ - name: CHRF
40
+ type: chrf
41
+ value: 0.87
42
+ language:
43
+ - th
44
  ---
45
 
46
+ # pakawadeep/ctfl-gec-th
 
47
 
48
+ This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large), trained for **Grammatical Error Correction (GEC)** in **Thai** for **L2 learners**. It was developed as part of the research *"Grammatical Error Correction for L2 Learners of Thai Using Large Language Models"*, and represents the best-performing model in the study.
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Model description
51
 
52
+ This model is based on the mT5-large architecture and was fine-tuned on the CTFL-GEC dataset, which contains human-annotated grammatical error corrections from L2 Thai learners. To improve generalization, the dataset was augmented using the Self-Instruct method with 200% additional synthetic pairs.
53
+
54
+ The model is capable of correcting sentence-level grammatical errors typical of L2 Thai writing, including issues with word order, omissions, and incorrect particles.
55
 
56
  ## Intended uses & limitations
57
 
58
+ ### Intended uses
59
+ - Grammatical error correction for Thai language learners
60
+ - Linguistic analysis of L2 learner errors
61
+ - Research in low-resource GEC methods
62
+
63
+ ### Limitations
64
+ - May not generalize to informal or dialectal Thai
65
+ - Performance may degrade on sentence types or domains not represented in the training data
66
+ - Designed for Thai GEC only; not optimized for multilingual correction tasks
67
 
68
  ## Training and evaluation data
69
 
70
+ The model was fine-tuned on a combined dataset consisting of:
71
+ - **CTFL-GEC**: A manually annotated corpus of Thai learner writing (370 writing samples, 4,200+ sentences)
72
+ - **Self-Instruct augmentation (200%)**: Synthetic GEC pairs generated using LLM prompting
73
+
74
+ Evaluation was conducted on a held-out portion of the human-annotated dataset using common GEC metrics.
75
 
76
  ## Training procedure
77
 
78
  ### Training hyperparameters
79
+ - **Optimizer**: AdamWeightDecay
80
+ - **Learning rate**: 2e-5
81
+ - **Beta1/Beta2**: 0.9 / 0.999
82
+ - **Epsilon**: 1e-7
83
+ - **Weight decay**: 0.01
84
+ - **Precision**: float32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ### Framework versions
 
87
  - Transformers 4.41.2
88
  - TensorFlow 2.15.0
89
  - Datasets 2.20.0
90
  - Tokenizers 0.19.1
91
+
92
+ ## Citation
93
+
94
+ If you use this model, please cite the associated thesis:
95
+
96
+ ```
97
+ Pakawadee P. Chookwan, "Grammatical Error Correction for L2 Learners of Thai Using Large Language Models", 2025.
98
+ ```