crossroderick commited on
Commit
794cf97
·
1 Parent(s): 18cf0a2

Pre-v5 update for the tokeniser (training date pushed to the 25th)

Browse files
Files changed (44) hide show
  1. .gitignore +3 -1
  2. README.md +2 -2
  3. checkpoints/checkpoint-61500/model.safetensors +1 -1
  4. checkpoints/checkpoint-61500/optimizer.pt +1 -1
  5. checkpoints/checkpoint-61500/rng_state.pth +1 -1
  6. checkpoints/checkpoint-61500/scaler.pt +1 -1
  7. checkpoints/checkpoint-61500/scheduler.pt +1 -1
  8. checkpoints/checkpoint-61500/special_tokens_map.json +7 -0
  9. checkpoints/checkpoint-61500/tokenizer.json +0 -0
  10. checkpoints/checkpoint-61500/tokenizer_config.json +213 -204
  11. checkpoints/checkpoint-61500/trainer_state.json +344 -344
  12. checkpoints/checkpoint-62000/model.safetensors +1 -1
  13. checkpoints/checkpoint-62000/optimizer.pt +1 -1
  14. checkpoints/checkpoint-62000/rng_state.pth +1 -1
  15. checkpoints/checkpoint-62000/scaler.pt +1 -1
  16. checkpoints/checkpoint-62000/scheduler.pt +1 -1
  17. checkpoints/checkpoint-62000/special_tokens_map.json +7 -0
  18. checkpoints/checkpoint-62000/tokenizer.json +0 -0
  19. checkpoints/checkpoint-62000/tokenizer_config.json +213 -204
  20. checkpoints/checkpoint-62000/trainer_state.json +347 -347
  21. checkpoints/checkpoint-62228/model.safetensors +1 -1
  22. checkpoints/checkpoint-62228/optimizer.pt +1 -1
  23. checkpoints/checkpoint-62228/rng_state.pth +1 -1
  24. checkpoints/checkpoint-62228/scaler.pt +1 -1
  25. checkpoints/checkpoint-62228/scheduler.pt +1 -1
  26. checkpoints/checkpoint-62228/special_tokens_map.json +7 -0
  27. checkpoints/checkpoint-62228/tokenizer.json +0 -0
  28. checkpoints/checkpoint-62228/tokenizer_config.json +213 -204
  29. checkpoints/checkpoint-62228/trainer_state.json +347 -347
  30. config.json +0 -60
  31. generation_config.json +0 -7
  32. requirements.txt +1 -0
  33. special_tokens_map.json +0 -125
  34. src/tokeniser/added_tokens.json +102 -0
  35. model.safetensors → src/tokeniser/dalat5_sp.model +2 -2
  36. src/tokeniser/dalat5_sp.vocab +0 -0
  37. src/tokeniser/special_tokens_map.json +21 -4
  38. spiece.model → src/tokeniser/spiece.model +2 -2
  39. src/tokeniser/tokenizer.json +0 -0
  40. src/tokeniser/tokenizer_config.json +405 -412
  41. src/train_t5.py +17 -2
  42. src/train_tokeniser.py +30 -43
  43. tokenizer.json +0 -0
  44. tokenizer_config.json +0 -939
.gitignore CHANGED
@@ -1,7 +1,9 @@
1
  /src/data/extracted
2
  /src/data/kkwiki-latest-pages-articles.xml.bz2
3
  /src/data/kazakh_latin_corpus.jsonl
 
4
  /src/data/clean_corpus.jsonl
5
  /src/data/kk.txt
6
  /logs/**
7
- /src/test_t5.py
 
 
1
  /src/data/extracted
2
  /src/data/kkwiki-latest-pages-articles.xml.bz2
3
  /src/data/kazakh_latin_corpus.jsonl
4
+ /src/data/tokeniser_corpus.txt
5
  /src/data/clean_corpus.jsonl
6
  /src/data/kk.txt
7
  /logs/**
8
+ /src/test_t5.py
9
+ /src/test_tokeniser.py
README.md CHANGED
@@ -120,7 +120,7 @@ print(output)
120
 
121
  KazParC деректер жинағын жүктеп алу үшін сізге Hugging Face есептік жазбасы қажет екенін ескеріңіз. Бұған қоса, жүктеп алуды бастау үшін өзіңізді аутентификациялау үшін `huggingface-cli` орнатуыңыз қажет. Бұл туралы толығырақ [мына жерден](https://huggingface.co/docs/huggingface_hub/en/guides/cli) оқыңыз / Please note that you'll need a Hugging Face account to download the KazParC dataset. Additionally, you'll need to install `huggingface-cli` to authenticate yourself for the download to commence. Read more about it [here](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
122
 
123
- Егер сіз Windows жүйесінде болсаңыз, `get_data.sh` сценарийі жұмыс істемеуі мүмкін. Дегенмен, файлдағы сілтемелерді орындап, ондағы қадамдарды қолмен орындау арқылы әлі де деректерді алуға болады. Сол сияқты, `generate_clean_corpus.sh` файлында да қате пайда болады, бұл `kazakh_latin_corpus.json` файлындағы бос немесе бос жолдарды сүзу, сондай-ақ оны араластыру үшін Windows жүйесінің баламалы мүмкіндігін табуды талап етеді. Бұған қоса, `wikiextractor` бумасын алдын ала орнатқаныңызға сенімді болыңыз (нақты пайдаланылған нұсқаны `requirements.txt` файлынан табуға болады) / If you're on Windows, the `get_data.sh` script likely won't work. However, you can still get the data by following the links in the file and manually doing the steps in there. Likewise, `generate_clean_corpus.sh` will also error out, requiring you to find an equivalent Windows functionality to filter out blank or empty lines in the `kazakh_latin_corpus.json` file, as well as shuffle it. Additionally, be sure to install the `wikiextractor` package beforehand (the exact version used can be found in the `requirements.txt` file).
124
 
125
  ---
126
 
@@ -134,7 +134,7 @@ KazParC деректер жинағын жүктеп алу үшін сізге
134
 
135
  - **DalaT5 v4**: 23 сәуірде нақтыланған, 23 сәуірде қолжетімді болды. Жаттығу үшін ~1,9 миллион жазба (Wikipedia dump + CC100 + KazParC) пайдаланылды. Семантикалық түсініктің жоғарылауын көрсететін төртінші итерация / Fine-tuned on April 23, made available on April 23. Used ~1.9 million records (Wikipedia dump + CC100 + KazParC) for training. Fourth iteration that showed increased semantic understanding
136
 
137
- - **DalaT5 v5**: 24 сәуірде болатын нақты баптау сол күні шығарылады. ~1,9 миллион жазбаны (v4 сияқты) пайдалануға және қазақ кириллица және латын графикасын жақсырақ өңдеу үшін жеке таңбалауышқа ие болуға орнату / Fine-tuning taking place as of April 24, will be released on the same day. Set to use ~1.9 million records (like v4) and have its own tokeniser to better handle the Kazakh Cyrillic and Latin scripts
138
 
139
  ---
140
 
 
120
 
121
  KazParC деректер жинағын жүктеп алу үшін сізге Hugging Face есептік жазбасы қажет екенін ескеріңіз. Бұған қоса, жүктеп алуды бастау үшін өзіңізді аутентификациялау үшін `huggingface-cli` орнатуыңыз қажет. Бұл туралы толығырақ [мына жерден](https://huggingface.co/docs/huggingface_hub/en/guides/cli) оқыңыз / Please note that you'll need a Hugging Face account to download the KazParC dataset. Additionally, you'll need to install `huggingface-cli` to authenticate yourself for the download to commence. Read more about it [here](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
122
 
123
+ Егер сіз Windows жүйесінде болсаңыз, `get_data.sh` сценарийі жұмыс істемеуі мүмкін. Дегенмен, файлдағы сілтемелерді орындап, ондағы қадамдарды қолмен орындау арқылы әлі де деректерді алуға болады. Сол сияқты, `generate_clean_corpus.sh` файлында да қате пайда болады, бұл `kazakh_latin_corpus.json` файлындағы бос немесе бос жолдарды сүзу, сондай-ақ оны араластыру үшін Windows жүйесінің баламалы мүмкіндігін табуды талап етеді. Бұған қоса, `wikiextractor` және `sentencepiece` бумасын алдын ала орнатқаныңызға сенімді болыңыз (нақты пайдаланылған нұсқаны `requirements.txt` файлынан табуға болады). / If you're on Windows, the `get_data.sh` script likely won't work. However, you can still get the data by following the links in the file and manually doing the steps in there. Likewise, `generate_clean_corpus.sh` will also error out, requiring you to find an equivalent Windows functionality to filter out blank or empty lines in the `kazakh_latin_corpus.json` file, as well as shuffle it. Additionally, be sure to install the `wikiextractor` and `sentencepiece` package beforehand (the exact version used can be found in the `requirements.txt` file).
124
 
125
  ---
126
 
 
134
 
135
  - **DalaT5 v4**: 23 сәуірде нақтыланған, 23 сәуірде қолжетімді болды. Жаттығу үшін ~1,9 миллион жазба (Wikipedia dump + CC100 + KazParC) пайдаланылды. Семантикалық түсініктің жоғарылауын көрсететін төртінші итерация / Fine-tuned on April 23, made available on April 23. Used ~1.9 million records (Wikipedia dump + CC100 + KazParC) for training. Fourth iteration that showed increased semantic understanding
136
 
137
+ - **DalaT5 v5**: 24 сәуірде болатын нақты баптау сол күні шығарылады. ~1,9 миллион жазбаны (v4 сияқты) пайдалануға және қазақ кириллица және латын графикасын жақсырақ өңдеу үшін жеке таңбалауышқа ие болуға орнату / Fine-tuning will take place on April 25 and will be released on the same day. Set to use ~1.9 million records (like v4) and have its own tokeniser to better handle the Kazakh Cyrillic and Latin scripts
138
 
139
  ---
140
 
checkpoints/checkpoint-61500/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:08bb2f5549ca7a4686fa163f40a3a840a02556dd674b562f51a7d4b3bbb25446
3
  size 242041896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68a63cff9119ec9d2214a4d45626957783fc01857d6c7e6760338d0ac8d4d117
3
  size 242041896
checkpoints/checkpoint-61500/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b3c0b1cfae7064c3e583c81fc7b89d7d3c6a5688579e6e54be3d400f2a760681
3
  size 484163514
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3342964c9be74032ffa9b4cff58e4f7eaed7adc5ad296183509cf0a30227c853
3
  size 484163514
checkpoints/checkpoint-61500/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:451ec28fae2c51302e3d439040b8e5e6eb5b4c0d1c58af17230cc30adbfb2190
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08afadc7893d56386d744b1e0cbce95e6d6eff9ef96e2c2b3486065fbc550164
3
  size 14244
checkpoints/checkpoint-61500/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8f9ec04d900cba2a2e93335548c511413a07c2ec81cdb8a6699d3923fe215e69
3
  size 988
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0acbdad870bdcd3f627d6745b7dee6dd7cb36b559345a9d75527af16d7dec0cc
3
  size 988
checkpoints/checkpoint-61500/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:006aa6abdcde538a5090b2a8f8114d416f5dd26532791f36d1980c632d584d30
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7fac910dbf1051bedbd6fe5b67e886e57f5257d7d0c137811a8f5950f00476b
3
  size 1064
checkpoints/checkpoint-61500/special_tokens_map.json CHANGED
@@ -101,6 +101,13 @@
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
 
 
 
 
 
 
 
104
  "eos_token": {
105
  "content": "</s>",
106
  "lstrip": false,
 
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
104
+ "bos_token": {
105
+ "content": "<s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
  "eos_token": {
112
  "content": "</s>",
113
  "lstrip": false,
checkpoints/checkpoint-61500/tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-61500/tokenizer_config.json CHANGED
@@ -10,7 +10,7 @@
10
  "special": true
11
  },
12
  "1": {
13
- "content": "</s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
@@ -18,414 +18,414 @@
18
  "special": true
19
  },
20
  "2": {
21
- "content": "<unk>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
- "32000": {
29
- "content": "<extra_id_99>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
- "32001": {
37
- "content": "<extra_id_98>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
- "32002": {
45
- "content": "<extra_id_97>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
- "32003": {
53
- "content": "<extra_id_96>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
- "32004": {
61
- "content": "<extra_id_95>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
- "32005": {
69
- "content": "<extra_id_94>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
- "32006": {
77
- "content": "<extra_id_93>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
- "32007": {
85
- "content": "<extra_id_92>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
- "32008": {
93
- "content": "<extra_id_91>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
- "32009": {
101
- "content": "<extra_id_90>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
- "32010": {
109
- "content": "<extra_id_89>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
- "32011": {
117
- "content": "<extra_id_88>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
- "32012": {
125
- "content": "<extra_id_87>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
- "32013": {
133
- "content": "<extra_id_86>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
- "32014": {
141
- "content": "<extra_id_85>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
- "32015": {
149
- "content": "<extra_id_84>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
- "32016": {
157
- "content": "<extra_id_83>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
- "32017": {
165
- "content": "<extra_id_82>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
- "32018": {
173
- "content": "<extra_id_81>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
- "32019": {
181
- "content": "<extra_id_80>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
- "32020": {
189
- "content": "<extra_id_79>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
- "32021": {
197
- "content": "<extra_id_78>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
- "32022": {
205
- "content": "<extra_id_77>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
- "32023": {
213
- "content": "<extra_id_76>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
- "32024": {
221
- "content": "<extra_id_75>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
- "32025": {
229
- "content": "<extra_id_74>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
- "32026": {
237
- "content": "<extra_id_73>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
- "32027": {
245
- "content": "<extra_id_72>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
- "32028": {
253
- "content": "<extra_id_71>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
- "32029": {
261
- "content": "<extra_id_70>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
- "32030": {
269
- "content": "<extra_id_69>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
- "32031": {
277
- "content": "<extra_id_68>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
- "32032": {
285
- "content": "<extra_id_67>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
- "32033": {
293
- "content": "<extra_id_66>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
- "32034": {
301
- "content": "<extra_id_65>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
- "32035": {
309
- "content": "<extra_id_64>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
- "32036": {
317
- "content": "<extra_id_63>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
- "32037": {
325
- "content": "<extra_id_62>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
- "32038": {
333
- "content": "<extra_id_61>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
- "32039": {
341
- "content": "<extra_id_60>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
- "32040": {
349
- "content": "<extra_id_59>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
- "32041": {
357
- "content": "<extra_id_58>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
- "32042": {
365
- "content": "<extra_id_57>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
- "32043": {
373
- "content": "<extra_id_56>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
- "32044": {
381
- "content": "<extra_id_55>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
- "32045": {
389
- "content": "<extra_id_54>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
- "32046": {
397
- "content": "<extra_id_53>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
- "32047": {
405
- "content": "<extra_id_52>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
- "32048": {
413
- "content": "<extra_id_51>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
- "32049": {
421
- "content": "<extra_id_50>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
- "32050": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
@@ -433,392 +433,400 @@
433
  "single_word": false,
434
  "special": true
435
  },
436
- "32051": {
437
- "content": "<extra_id_48>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
- "32052": {
445
- "content": "<extra_id_47>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
- "32053": {
453
- "content": "<extra_id_46>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
- "32054": {
461
- "content": "<extra_id_45>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
- "32055": {
469
- "content": "<extra_id_44>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
- "32056": {
477
- "content": "<extra_id_43>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
- "32057": {
485
- "content": "<extra_id_42>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
- "32058": {
493
- "content": "<extra_id_41>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
- "32059": {
501
- "content": "<extra_id_40>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
- "32060": {
509
- "content": "<extra_id_39>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
- "32061": {
517
- "content": "<extra_id_38>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
- "32062": {
525
- "content": "<extra_id_37>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
- "32063": {
533
- "content": "<extra_id_36>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
- "32064": {
541
- "content": "<extra_id_35>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
- "32065": {
549
- "content": "<extra_id_34>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
- "32066": {
557
- "content": "<extra_id_33>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
- "32067": {
565
- "content": "<extra_id_32>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
- "32068": {
573
- "content": "<extra_id_31>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
- "32069": {
581
- "content": "<extra_id_30>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
- "32070": {
589
- "content": "<extra_id_29>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
- "32071": {
597
- "content": "<extra_id_28>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
- "32072": {
605
- "content": "<extra_id_27>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
- "32073": {
613
- "content": "<extra_id_26>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
- "32074": {
621
- "content": "<extra_id_25>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
- "32075": {
629
- "content": "<extra_id_24>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
- "32076": {
637
- "content": "<extra_id_23>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
- "32077": {
645
- "content": "<extra_id_22>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
- "32078": {
653
- "content": "<extra_id_21>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
- "32079": {
661
- "content": "<extra_id_20>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
- "32080": {
669
- "content": "<extra_id_19>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
- "32081": {
677
- "content": "<extra_id_18>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
- "32082": {
685
- "content": "<extra_id_17>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
- "32083": {
693
- "content": "<extra_id_16>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
- "32084": {
701
- "content": "<extra_id_15>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
- "32085": {
709
- "content": "<extra_id_14>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
- "32086": {
717
- "content": "<extra_id_13>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
- "32087": {
725
- "content": "<extra_id_12>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
- "32088": {
733
- "content": "<extra_id_11>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
- "32089": {
741
- "content": "<extra_id_10>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
- "32090": {
749
- "content": "<extra_id_9>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
- "32091": {
757
- "content": "<extra_id_8>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
- "32092": {
765
- "content": "<extra_id_7>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
- "32093": {
773
- "content": "<extra_id_6>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
- "32094": {
781
- "content": "<extra_id_5>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
- "32095": {
789
- "content": "<extra_id_4>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
- "32096": {
797
- "content": "<extra_id_3>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
- "32097": {
805
- "content": "<extra_id_2>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
- "32098": {
813
- "content": "<extra_id_1>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
- "32099": {
821
- "content": "<extra_id_0>",
 
 
 
 
 
 
 
 
822
  "lstrip": false,
823
  "normalized": false,
824
  "rstrip": false,
@@ -928,12 +936,13 @@
928
  "<extra_id_98>",
929
  "<extra_id_99>"
930
  ],
931
- "clean_up_tokenization_spaces": true,
 
932
  "eos_token": "</s>",
933
  "extra_ids": 100,
934
  "extra_special_tokens": {},
935
- "model_max_length": 512,
936
  "pad_token": "<pad>",
937
- "tokenizer_class": "T5Tokenizer",
938
  "unk_token": "<unk>"
939
  }
 
10
  "special": true
11
  },
12
  "1": {
13
+ "content": "<s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
 
18
  "special": true
19
  },
20
  "2": {
21
+ "content": "</s>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
+ "3": {
29
+ "content": "<unk>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
+ "8000": {
37
+ "content": "<extra_id_0>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
+ "8001": {
45
+ "content": "<extra_id_1>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
+ "8002": {
53
+ "content": "<extra_id_2>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
+ "8003": {
61
+ "content": "<extra_id_3>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
+ "8004": {
69
+ "content": "<extra_id_4>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
+ "8005": {
77
+ "content": "<extra_id_5>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
+ "8006": {
85
+ "content": "<extra_id_6>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
+ "8007": {
93
+ "content": "<extra_id_7>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
+ "8008": {
101
+ "content": "<extra_id_8>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
+ "8009": {
109
+ "content": "<extra_id_9>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
+ "8010": {
117
+ "content": "<extra_id_10>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
+ "8011": {
125
+ "content": "<extra_id_11>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
+ "8012": {
133
+ "content": "<extra_id_12>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
+ "8013": {
141
+ "content": "<extra_id_13>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
+ "8014": {
149
+ "content": "<extra_id_14>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
+ "8015": {
157
+ "content": "<extra_id_15>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
+ "8016": {
165
+ "content": "<extra_id_16>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
+ "8017": {
173
+ "content": "<extra_id_17>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
+ "8018": {
181
+ "content": "<extra_id_18>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
+ "8019": {
189
+ "content": "<extra_id_19>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
+ "8020": {
197
+ "content": "<extra_id_20>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
+ "8021": {
205
+ "content": "<extra_id_21>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
+ "8022": {
213
+ "content": "<extra_id_22>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
+ "8023": {
221
+ "content": "<extra_id_23>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
+ "8024": {
229
+ "content": "<extra_id_24>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
+ "8025": {
237
+ "content": "<extra_id_25>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
+ "8026": {
245
+ "content": "<extra_id_26>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
+ "8027": {
253
+ "content": "<extra_id_27>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
+ "8028": {
261
+ "content": "<extra_id_28>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
+ "8029": {
269
+ "content": "<extra_id_29>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
+ "8030": {
277
+ "content": "<extra_id_30>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
+ "8031": {
285
+ "content": "<extra_id_31>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
+ "8032": {
293
+ "content": "<extra_id_32>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
+ "8033": {
301
+ "content": "<extra_id_33>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
+ "8034": {
309
+ "content": "<extra_id_34>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
+ "8035": {
317
+ "content": "<extra_id_35>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
+ "8036": {
325
+ "content": "<extra_id_36>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
+ "8037": {
333
+ "content": "<extra_id_37>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
+ "8038": {
341
+ "content": "<extra_id_38>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
+ "8039": {
349
+ "content": "<extra_id_39>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
+ "8040": {
357
+ "content": "<extra_id_40>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
+ "8041": {
365
+ "content": "<extra_id_41>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
+ "8042": {
373
+ "content": "<extra_id_42>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
+ "8043": {
381
+ "content": "<extra_id_43>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
+ "8044": {
389
+ "content": "<extra_id_44>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
+ "8045": {
397
+ "content": "<extra_id_45>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
+ "8046": {
405
+ "content": "<extra_id_46>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
+ "8047": {
413
+ "content": "<extra_id_47>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
+ "8048": {
421
+ "content": "<extra_id_48>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
+ "8049": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
 
433
  "single_word": false,
434
  "special": true
435
  },
436
+ "8050": {
437
+ "content": "<extra_id_50>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
+ "8051": {
445
+ "content": "<extra_id_51>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
+ "8052": {
453
+ "content": "<extra_id_52>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
+ "8053": {
461
+ "content": "<extra_id_53>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
+ "8054": {
469
+ "content": "<extra_id_54>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
+ "8055": {
477
+ "content": "<extra_id_55>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
+ "8056": {
485
+ "content": "<extra_id_56>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
+ "8057": {
493
+ "content": "<extra_id_57>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
+ "8058": {
501
+ "content": "<extra_id_58>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
+ "8059": {
509
+ "content": "<extra_id_59>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
+ "8060": {
517
+ "content": "<extra_id_60>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
+ "8061": {
525
+ "content": "<extra_id_61>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
+ "8062": {
533
+ "content": "<extra_id_62>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
+ "8063": {
541
+ "content": "<extra_id_63>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
+ "8064": {
549
+ "content": "<extra_id_64>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
+ "8065": {
557
+ "content": "<extra_id_65>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
+ "8066": {
565
+ "content": "<extra_id_66>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
+ "8067": {
573
+ "content": "<extra_id_67>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
+ "8068": {
581
+ "content": "<extra_id_68>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
+ "8069": {
589
+ "content": "<extra_id_69>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
+ "8070": {
597
+ "content": "<extra_id_70>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
+ "8071": {
605
+ "content": "<extra_id_71>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
+ "8072": {
613
+ "content": "<extra_id_72>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
+ "8073": {
621
+ "content": "<extra_id_73>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
+ "8074": {
629
+ "content": "<extra_id_74>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
+ "8075": {
637
+ "content": "<extra_id_75>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
+ "8076": {
645
+ "content": "<extra_id_76>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
+ "8077": {
653
+ "content": "<extra_id_77>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
+ "8078": {
661
+ "content": "<extra_id_78>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
+ "8079": {
669
+ "content": "<extra_id_79>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
+ "8080": {
677
+ "content": "<extra_id_80>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
+ "8081": {
685
+ "content": "<extra_id_81>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
+ "8082": {
693
+ "content": "<extra_id_82>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
+ "8083": {
701
+ "content": "<extra_id_83>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
+ "8084": {
709
+ "content": "<extra_id_84>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
+ "8085": {
717
+ "content": "<extra_id_85>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
+ "8086": {
725
+ "content": "<extra_id_86>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
+ "8087": {
733
+ "content": "<extra_id_87>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
+ "8088": {
741
+ "content": "<extra_id_88>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
+ "8089": {
749
+ "content": "<extra_id_89>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
+ "8090": {
757
+ "content": "<extra_id_90>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
+ "8091": {
765
+ "content": "<extra_id_91>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
+ "8092": {
773
+ "content": "<extra_id_92>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
+ "8093": {
781
+ "content": "<extra_id_93>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
+ "8094": {
789
+ "content": "<extra_id_94>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
+ "8095": {
797
+ "content": "<extra_id_95>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
+ "8096": {
805
+ "content": "<extra_id_96>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
+ "8097": {
813
+ "content": "<extra_id_97>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
+ "8098": {
821
+ "content": "<extra_id_98>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "8099": {
829
+ "content": "<extra_id_99>",
830
  "lstrip": false,
831
  "normalized": false,
832
  "rstrip": false,
 
936
  "<extra_id_98>",
937
  "<extra_id_99>"
938
  ],
939
+ "bos_token": "<s>",
940
+ "clean_up_tokenization_spaces": false,
941
  "eos_token": "</s>",
942
  "extra_ids": 100,
943
  "extra_special_tokens": {},
944
+ "model_max_length": 1000000000000000019884624838656,
945
  "pad_token": "<pad>",
946
+ "tokenizer_class": "T5TokenizerFast",
947
  "unk_token": "<unk>"
948
  }
checkpoints/checkpoint-61500/trainer_state.json CHANGED
@@ -11,863 +11,863 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
- "grad_norm": 0.3969729542732239,
15
- "learning_rate": 4.960146557819631e-05,
16
- "loss": 2.05,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
- "grad_norm": 0.3822907507419586,
22
- "learning_rate": 4.919971716912001e-05,
23
- "loss": 1.1207,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
- "grad_norm": 0.36019280552864075,
29
- "learning_rate": 4.879796876004371e-05,
30
- "loss": 0.9225,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
- "grad_norm": 0.30364033579826355,
36
- "learning_rate": 4.8396220350967415e-05,
37
- "loss": 0.8244,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
- "grad_norm": 0.45634394884109497,
43
- "learning_rate": 4.799447194189111e-05,
44
- "loss": 0.7506,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
- "grad_norm": 0.3562425374984741,
50
- "learning_rate": 4.759272353281481e-05,
51
- "loss": 0.7012,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
- "grad_norm": 0.33726808428764343,
57
- "learning_rate": 4.719097512373851e-05,
58
- "loss": 0.6706,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
- "grad_norm": 0.30098849534988403,
64
- "learning_rate": 4.678922671466221e-05,
65
- "loss": 0.6308,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
- "grad_norm": 0.29443585872650146,
71
- "learning_rate": 4.6387478305585915e-05,
72
- "loss": 0.6141,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
- "grad_norm": 0.25647810101509094,
78
- "learning_rate": 4.598572989650961e-05,
79
- "loss": 0.5866,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
- "grad_norm": 0.2516370415687561,
85
- "learning_rate": 4.558398148743331e-05,
86
- "loss": 0.5665,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
- "grad_norm": 0.3337278366088867,
92
- "learning_rate": 4.518223307835701e-05,
93
- "loss": 0.5427,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
- "grad_norm": 0.2592964470386505,
99
- "learning_rate": 4.478048466928072e-05,
100
- "loss": 0.5323,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
- "grad_norm": 0.28550606966018677,
106
- "learning_rate": 4.437873626020441e-05,
107
- "loss": 0.5187,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
- "grad_norm": 0.26474013924598694,
113
- "learning_rate": 4.397698785112811e-05,
114
- "loss": 0.5058,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
- "grad_norm": 0.3018198013305664,
120
- "learning_rate": 4.3575239442051814e-05,
121
- "loss": 0.5013,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
- "grad_norm": 0.2628585994243622,
127
- "learning_rate": 4.317349103297551e-05,
128
- "loss": 0.4883,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
- "grad_norm": 0.30172979831695557,
134
- "learning_rate": 4.277174262389921e-05,
135
- "loss": 0.4795,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
- "grad_norm": 0.25293004512786865,
141
- "learning_rate": 4.236999421482291e-05,
142
- "loss": 0.4682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
- "grad_norm": 0.2726214528083801,
148
- "learning_rate": 4.196824580574661e-05,
149
- "loss": 0.4641,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
- "grad_norm": 0.2570224106311798,
155
- "learning_rate": 4.1566497396670314e-05,
156
- "loss": 0.4556,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
- "grad_norm": 0.26380738615989685,
162
- "learning_rate": 4.1164748987594006e-05,
163
- "loss": 0.449,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
- "grad_norm": 0.2555176913738251,
169
- "learning_rate": 4.076300057851771e-05,
170
- "loss": 0.4412,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
- "grad_norm": 0.2122594565153122,
176
- "learning_rate": 4.036125216944141e-05,
177
- "loss": 0.4365,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
- "grad_norm": 0.2333071529865265,
183
- "learning_rate": 3.9959503760365116e-05,
184
- "loss": 0.433,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
- "grad_norm": 0.24873752892017365,
190
- "learning_rate": 3.955775535128881e-05,
191
- "loss": 0.4283,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
- "grad_norm": 0.32416871190071106,
197
- "learning_rate": 3.915600694221251e-05,
198
- "loss": 0.4218,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
- "grad_norm": 0.23515433073043823,
204
- "learning_rate": 3.875425853313621e-05,
205
- "loss": 0.4139,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
- "grad_norm": 0.22002151608467102,
211
- "learning_rate": 3.8353313620878064e-05,
212
- "loss": 0.417,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
- "grad_norm": 0.251897931098938,
218
- "learning_rate": 3.795156521180176e-05,
219
- "loss": 0.4106,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
- "grad_norm": 0.26212435960769653,
225
- "learning_rate": 3.754981680272546e-05,
226
- "loss": 0.4037,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
- "grad_norm": 0.2718159258365631,
232
  "learning_rate": 3.714887189046731e-05,
233
- "loss": 0.402,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
- "grad_norm": 0.23812739551067352,
239
- "learning_rate": 3.674712348139102e-05,
240
- "loss": 0.3953,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
- "grad_norm": 0.21076083183288574,
246
- "learning_rate": 3.634537507231471e-05,
247
- "loss": 0.3938,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
- "grad_norm": 0.25489869713783264,
253
- "learning_rate": 3.5943626663238416e-05,
254
- "loss": 0.3921,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
- "grad_norm": 0.24057357013225555,
260
- "learning_rate": 3.5541878254162115e-05,
261
- "loss": 0.3867,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
- "grad_norm": 0.24298915266990662,
267
- "learning_rate": 3.514012984508582e-05,
268
- "loss": 0.3868,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
- "grad_norm": 0.2183919996023178,
274
- "learning_rate": 3.473838143600951e-05,
275
- "loss": 0.3803,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
- "grad_norm": 0.2278251349925995,
281
- "learning_rate": 3.433663302693321e-05,
282
- "loss": 0.3775,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
- "grad_norm": 0.240201935172081,
288
- "learning_rate": 3.393568811467507e-05,
289
- "loss": 0.3751,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
- "grad_norm": 0.21118561923503876,
295
- "learning_rate": 3.353393970559877e-05,
296
- "loss": 0.3742,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
- "grad_norm": 0.22640825808048248,
302
- "learning_rate": 3.313219129652247e-05,
303
- "loss": 0.3729,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
- "grad_norm": 0.23105542361736298,
309
- "learning_rate": 3.2730442887446166e-05,
310
- "loss": 0.3687,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
- "grad_norm": 0.24791008234024048,
316
- "learning_rate": 3.2329497975188024e-05,
317
- "loss": 0.3658,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
- "grad_norm": 0.2497881054878235,
323
- "learning_rate": 3.1928553062929875e-05,
324
- "loss": 0.3646,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
- "grad_norm": 0.2395261973142624,
330
- "learning_rate": 3.152680465385357e-05,
331
- "loss": 0.3655,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
- "grad_norm": 0.21194589138031006,
337
- "learning_rate": 3.112505624477727e-05,
338
- "loss": 0.3646,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
- "grad_norm": 0.21682508289813995,
344
- "learning_rate": 3.072330783570097e-05,
345
- "loss": 0.3629,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
- "grad_norm": 0.23710566759109497,
351
- "learning_rate": 3.0321559426624674e-05,
352
- "loss": 0.3583,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
- "grad_norm": 0.23857219517230988,
358
- "learning_rate": 2.9919811017548372e-05,
359
- "loss": 0.3561,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
- "grad_norm": 0.241951584815979,
365
- "learning_rate": 2.9518062608472075e-05,
366
- "loss": 0.3537,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
- "grad_norm": 0.275765061378479,
372
- "learning_rate": 2.9116314199395773e-05,
373
- "loss": 0.3493,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
- "grad_norm": 0.24757184088230133,
379
  "learning_rate": 2.871536928713762e-05,
380
- "loss": 0.3486,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
- "grad_norm": 0.21833688020706177,
386
  "learning_rate": 2.8313620878061327e-05,
387
- "loss": 0.3461,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
- "grad_norm": 0.21623168885707855,
393
  "learning_rate": 2.7911872468985022e-05,
394
- "loss": 0.3468,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
- "grad_norm": 0.20861521363258362,
400
  "learning_rate": 2.7510124059908728e-05,
401
- "loss": 0.3481,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
- "grad_norm": 0.20291315019130707,
407
- "learning_rate": 2.7108375650832423e-05,
408
- "loss": 0.3474,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
- "grad_norm": 0.2101660966873169,
414
  "learning_rate": 2.6707430738574275e-05,
415
- "loss": 0.3412,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
- "grad_norm": 0.23224739730358124,
421
  "learning_rate": 2.6305682329497977e-05,
422
- "loss": 0.3422,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
- "grad_norm": 0.22987599670886993,
428
  "learning_rate": 2.5903933920421676e-05,
429
- "loss": 0.3407,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
- "grad_norm": 0.22307533025741577,
435
- "learning_rate": 2.5502185511345378e-05,
436
- "loss": 0.3365,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
- "grad_norm": 0.20577801764011383,
442
  "learning_rate": 2.510124059908723e-05,
443
- "loss": 0.3409,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
- "grad_norm": 0.23968417942523956,
449
  "learning_rate": 2.4699492190010928e-05,
450
- "loss": 0.339,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
- "grad_norm": 0.2166174054145813,
456
  "learning_rate": 2.429774378093463e-05,
457
- "loss": 0.3317,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
- "grad_norm": 0.22259151935577393,
463
  "learning_rate": 2.389599537185833e-05,
464
- "loss": 0.3404,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
- "grad_norm": 0.2585219442844391,
470
- "learning_rate": 2.3495050459600184e-05,
471
- "loss": 0.3322,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
- "grad_norm": 0.23949937522411346,
477
- "learning_rate": 2.3093302050523882e-05,
478
- "loss": 0.3332,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
- "grad_norm": 0.2360944151878357,
484
  "learning_rate": 2.269155364144758e-05,
485
- "loss": 0.3374,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
- "grad_norm": 0.23383018374443054,
491
  "learning_rate": 2.228980523237128e-05,
492
- "loss": 0.3287,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
- "grad_norm": 0.25602060556411743,
498
- "learning_rate": 2.1888860320113135e-05,
499
- "loss": 0.3262,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
- "grad_norm": 0.2233658730983734,
505
- "learning_rate": 2.1487111911036833e-05,
506
- "loss": 0.3294,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
- "grad_norm": 0.23545712232589722,
512
- "learning_rate": 2.1085363501960532e-05,
513
- "loss": 0.3263,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
- "grad_norm": 0.22479598224163055,
519
- "learning_rate": 2.0683615092884234e-05,
520
- "loss": 0.328,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
- "grad_norm": 0.22207121551036835,
526
- "learning_rate": 2.0282670180626086e-05,
527
- "loss": 0.3275,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
- "grad_norm": 0.23822110891342163,
533
- "learning_rate": 1.9880921771549785e-05,
534
- "loss": 0.3273,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
- "grad_norm": 0.23664866387844086,
540
- "learning_rate": 1.9479173362473487e-05,
541
- "loss": 0.318,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
- "grad_norm": 0.18543508648872375,
547
- "learning_rate": 1.9077424953397185e-05,
548
- "loss": 0.3235,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
- "grad_norm": 0.23305822908878326,
554
- "learning_rate": 1.8676480041139037e-05,
555
- "loss": 0.3243,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
- "grad_norm": 0.21699073910713196,
561
- "learning_rate": 1.827473163206274e-05,
562
- "loss": 0.3222,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
- "grad_norm": 0.2757895588874817,
568
  "learning_rate": 1.7872983222986438e-05,
569
- "loss": 0.3248,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
- "grad_norm": 0.19769324362277985,
575
  "learning_rate": 1.7471234813910137e-05,
576
- "loss": 0.3179,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
- "grad_norm": 0.18964402377605438,
582
- "learning_rate": 1.707028990165199e-05,
583
- "loss": 0.3178,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
- "grad_norm": 0.2584107220172882,
589
- "learning_rate": 1.666854149257569e-05,
590
- "loss": 0.318,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
- "grad_norm": 0.25919750332832336,
596
- "learning_rate": 1.626759658031754e-05,
597
- "loss": 0.3205,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
- "grad_norm": 0.24371759593486786,
603
- "learning_rate": 1.5865848171241244e-05,
604
- "loss": 0.3186,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
- "grad_norm": 0.24457883834838867,
610
- "learning_rate": 1.5464099762164942e-05,
611
- "loss": 0.3162,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
- "grad_norm": 0.1918337345123291,
617
- "learning_rate": 1.5062351353088641e-05,
618
- "loss": 0.3169,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
- "grad_norm": 0.2350657880306244,
624
- "learning_rate": 1.4660602944012342e-05,
625
- "loss": 0.3171,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
- "grad_norm": 0.2481279820203781,
631
- "learning_rate": 1.4258854534936042e-05,
632
- "loss": 0.3179,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
- "grad_norm": 0.21132701635360718,
638
- "learning_rate": 1.3857106125859743e-05,
639
- "loss": 0.3125,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
- "grad_norm": 0.20240716636180878,
645
- "learning_rate": 1.3455357716783443e-05,
646
- "loss": 0.3172,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
- "grad_norm": 0.2224823385477066,
652
- "learning_rate": 1.3054412804525296e-05,
653
- "loss": 0.3151,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
- "grad_norm": 0.19261781871318817,
659
  "learning_rate": 1.2652664395448997e-05,
660
- "loss": 0.312,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
- "grad_norm": 0.16068917512893677,
666
  "learning_rate": 1.2250915986372695e-05,
667
- "loss": 0.3145,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
- "grad_norm": 0.18192972242832184,
673
  "learning_rate": 1.1849167577296394e-05,
674
- "loss": 0.3134,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
- "grad_norm": 0.19884943962097168,
680
- "learning_rate": 1.1448222665038247e-05,
681
- "loss": 0.3119,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
- "grad_norm": 0.1883106529712677,
687
- "learning_rate": 1.1046474255961948e-05,
688
- "loss": 0.316,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
- "grad_norm": 0.19331087172031403,
694
- "learning_rate": 1.0644725846885646e-05,
695
- "loss": 0.3135,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
- "grad_norm": 0.20041531324386597,
701
  "learning_rate": 1.0242977437809347e-05,
702
- "loss": 0.3112,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
- "grad_norm": 0.18530187010765076,
708
- "learning_rate": 9.8420325255512e-06,
709
- "loss": 0.3122,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
- "grad_norm": 0.22725620865821838,
715
- "learning_rate": 9.4402841164749e-06,
716
- "loss": 0.3122,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
- "grad_norm": 0.23093479871749878,
722
- "learning_rate": 9.0385357073986e-06,
723
- "loss": 0.3149,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
- "grad_norm": 0.19580845534801483,
729
- "learning_rate": 8.6367872983223e-06,
730
- "loss": 0.3121,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
- "grad_norm": 0.1742846667766571,
736
- "learning_rate": 8.235842386064153e-06,
737
- "loss": 0.3094,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
- "grad_norm": 0.18685191869735718,
743
- "learning_rate": 7.834093976987852e-06,
744
- "loss": 0.309,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
- "grad_norm": 0.21959276497364044,
750
- "learning_rate": 7.432345567911551e-06,
751
- "loss": 0.3118,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
- "grad_norm": 0.1935770958662033,
757
- "learning_rate": 7.030597158835252e-06,
758
- "loss": 0.3106,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
- "grad_norm": 0.19977129995822906,
764
- "learning_rate": 6.629652246577103e-06,
765
- "loss": 0.3101,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
- "grad_norm": 0.2006288766860962,
771
- "learning_rate": 6.2279038375008035e-06,
772
- "loss": 0.3099,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
- "grad_norm": 0.19280743598937988,
778
- "learning_rate": 5.826155428424504e-06,
779
- "loss": 0.308,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
- "grad_norm": 0.22095157206058502,
785
- "learning_rate": 5.424407019348204e-06,
786
- "loss": 0.3069,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
- "grad_norm": 0.2091740071773529,
792
  "learning_rate": 5.022658610271903e-06,
793
- "loss": 0.3062,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
- "grad_norm": 0.24772244691848755,
799
  "learning_rate": 4.620910201195604e-06,
800
- "loss": 0.3093,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
- "grad_norm": 0.1973961740732193,
806
  "learning_rate": 4.219161792119303e-06,
807
- "loss": 0.309,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
- "grad_norm": 0.22767914831638336,
813
  "learning_rate": 3.817413383043003e-06,
814
- "loss": 0.3109,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
- "grad_norm": 0.21461111307144165,
820
- "learning_rate": 3.416468470784856e-06,
821
- "loss": 0.3075,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
- "grad_norm": 0.24607454240322113,
827
- "learning_rate": 3.0147200617085557e-06,
828
- "loss": 0.3058,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
- "grad_norm": 0.19667118787765503,
834
  "learning_rate": 2.6129716526322558e-06,
835
- "loss": 0.3072,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
- "grad_norm": 0.22604137659072876,
841
  "learning_rate": 2.211223243555956e-06,
842
- "loss": 0.3064,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
- "grad_norm": 0.1879967898130417,
848
- "learning_rate": 1.8102783312978082e-06,
849
- "loss": 0.3063,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
- "grad_norm": 0.21271295845508575,
855
- "learning_rate": 1.408529922221508e-06,
856
- "loss": 0.3076,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
- "grad_norm": 0.16714586317539215,
862
- "learning_rate": 1.006781513145208e-06,
863
- "loss": 0.3092,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
- "grad_norm": 0.20666128396987915,
869
- "learning_rate": 6.050331040689079e-07,
870
- "loss": 0.3076,
871
  "step": 61500
872
  }
873
  ],
@@ -888,7 +888,7 @@
888
  "attributes": {}
889
  }
890
  },
891
- "total_flos": 1.3317619730664653e+17,
892
  "train_batch_size": 32,
893
  "trial_name": null,
894
  "trial_params": null
 
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
+ "grad_norm": 0.2569522559642792,
15
+ "learning_rate": 4.960307257183262e-05,
16
+ "loss": 2.9119,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
+ "grad_norm": 0.26731985807418823,
22
+ "learning_rate": 4.9201324162756315e-05,
23
+ "loss": 2.2886,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
+ "grad_norm": 0.3099210560321808,
29
+ "learning_rate": 4.8799575753680014e-05,
30
+ "loss": 2.1431,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
+ "grad_norm": 0.28836730122566223,
36
+ "learning_rate": 4.839782734460372e-05,
37
+ "loss": 2.0369,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
+ "grad_norm": 0.4808545708656311,
43
+ "learning_rate": 4.799607893552742e-05,
44
+ "loss": 1.932,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
+ "grad_norm": 0.38000208139419556,
50
+ "learning_rate": 4.759433052645112e-05,
51
+ "loss": 1.7766,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
+ "grad_norm": 0.4310196340084076,
57
+ "learning_rate": 4.7192582117374816e-05,
58
+ "loss": 1.6022,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
+ "grad_norm": 0.40425005555152893,
64
+ "learning_rate": 4.6790833708298515e-05,
65
+ "loss": 1.4576,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
+ "grad_norm": 0.3811793327331543,
71
+ "learning_rate": 4.638908529922222e-05,
72
+ "loss": 1.3384,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
+ "grad_norm": 0.38943949341773987,
78
+ "learning_rate": 4.598733689014591e-05,
79
+ "loss": 1.2233,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
+ "grad_norm": 0.5517480373382568,
85
+ "learning_rate": 4.558558848106962e-05,
86
+ "loss": 1.1342,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
+ "grad_norm": 0.4235232174396515,
92
+ "learning_rate": 4.518384007199332e-05,
93
+ "loss": 1.0432,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
+ "grad_norm": 0.4617592692375183,
99
+ "learning_rate": 4.478209166291702e-05,
100
+ "loss": 0.9781,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
+ "grad_norm": 0.5447149872779846,
106
+ "learning_rate": 4.4380343253840714e-05,
107
+ "loss": 0.927,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
+ "grad_norm": 0.4740816354751587,
113
+ "learning_rate": 4.397859484476441e-05,
114
+ "loss": 0.8674,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
+ "grad_norm": 0.5207423567771912,
120
+ "learning_rate": 4.357684643568812e-05,
121
+ "loss": 0.8149,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
+ "grad_norm": 0.47738897800445557,
127
+ "learning_rate": 4.317509802661182e-05,
128
+ "loss": 0.7685,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
+ "grad_norm": 0.4176841676235199,
134
+ "learning_rate": 4.2773349617535516e-05,
135
+ "loss": 0.7119,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
+ "grad_norm": 0.381345272064209,
141
+ "learning_rate": 4.2371601208459215e-05,
142
+ "loss": 0.6682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
+ "grad_norm": 0.6301918625831604,
148
+ "learning_rate": 4.1969852799382914e-05,
149
+ "loss": 0.6505,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
+ "grad_norm": 0.4057278335094452,
155
+ "learning_rate": 4.156810439030662e-05,
156
+ "loss": 0.6063,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
+ "grad_norm": 0.5442121624946594,
162
+ "learning_rate": 4.116635598123031e-05,
163
+ "loss": 0.5735,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
+ "grad_norm": 0.5113051533699036,
169
+ "learning_rate": 4.076460757215402e-05,
170
+ "loss": 0.5432,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
+ "grad_norm": 0.6383316516876221,
176
+ "learning_rate": 4.0362859163077716e-05,
177
+ "loss": 0.5143,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
+ "grad_norm": 0.4316321611404419,
183
+ "learning_rate": 3.996111075400142e-05,
184
+ "loss": 0.4867,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
+ "grad_norm": 0.42703017592430115,
190
+ "learning_rate": 3.955936234492511e-05,
191
+ "loss": 0.4614,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
+ "grad_norm": 0.4263227880001068,
197
+ "learning_rate": 3.915761393584881e-05,
198
+ "loss": 0.4391,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
+ "grad_norm": 0.47577473521232605,
204
+ "learning_rate": 3.875586552677252e-05,
205
+ "loss": 0.4241,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
+ "grad_norm": 0.3419073224067688,
211
+ "learning_rate": 3.8354117117696216e-05,
212
+ "loss": 0.4019,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
+ "grad_norm": 0.3402538001537323,
218
+ "learning_rate": 3.7952368708619915e-05,
219
+ "loss": 0.3876,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
+ "grad_norm": 0.7072747349739075,
225
+ "learning_rate": 3.7550620299543614e-05,
226
+ "loss": 0.364,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
+ "grad_norm": 0.31305554509162903,
232
  "learning_rate": 3.714887189046731e-05,
233
+ "loss": 0.3463,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
+ "grad_norm": 0.4203876554965973,
239
+ "learning_rate": 3.674792697820917e-05,
240
+ "loss": 0.3371,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
+ "grad_norm": 0.49149152636528015,
246
+ "learning_rate": 3.634617856913286e-05,
247
+ "loss": 0.3189,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
+ "grad_norm": 0.6438118815422058,
253
+ "learning_rate": 3.594443016005657e-05,
254
+ "loss": 0.3074,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
+ "grad_norm": 0.6619039177894592,
260
+ "learning_rate": 3.554268175098027e-05,
261
+ "loss": 0.2989,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
+ "grad_norm": 0.39272341132164,
267
+ "learning_rate": 3.514093334190397e-05,
268
+ "loss": 0.2818,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
+ "grad_norm": 0.3980565369129181,
274
+ "learning_rate": 3.473998842964582e-05,
275
+ "loss": 0.273,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
+ "grad_norm": 0.3052268922328949,
281
+ "learning_rate": 3.4338240020569516e-05,
282
+ "loss": 0.2677,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
+ "grad_norm": 0.5999760031700134,
288
+ "learning_rate": 3.3937295108311374e-05,
289
+ "loss": 0.2572,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
+ "grad_norm": 0.4283508062362671,
295
+ "learning_rate": 3.3536350196053226e-05,
296
+ "loss": 0.2468,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
+ "grad_norm": 0.4289894700050354,
302
+ "learning_rate": 3.3134601786976924e-05,
303
+ "loss": 0.2414,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
+ "grad_norm": 0.26386120915412903,
309
+ "learning_rate": 3.273285337790062e-05,
310
+ "loss": 0.2422,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
+ "grad_norm": 0.41095244884490967,
316
+ "learning_rate": 3.233110496882433e-05,
317
+ "loss": 0.2282,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
+ "grad_norm": 0.29514652490615845,
323
+ "learning_rate": 3.192935655974803e-05,
324
+ "loss": 0.2252,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
+ "grad_norm": 0.4044126570224762,
330
+ "learning_rate": 3.152760815067172e-05,
331
+ "loss": 0.2211,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
+ "grad_norm": 0.3767038881778717,
337
+ "learning_rate": 3.1125859741595425e-05,
338
+ "loss": 0.2115,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
+ "grad_norm": 0.36812517046928406,
344
+ "learning_rate": 3.0724111332519124e-05,
345
+ "loss": 0.2059,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
+ "grad_norm": 0.3709106147289276,
351
+ "learning_rate": 3.0322362923442826e-05,
352
+ "loss": 0.2035,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
+ "grad_norm": 0.3285115361213684,
358
+ "learning_rate": 2.9920614514366525e-05,
359
+ "loss": 0.1993,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
+ "grad_norm": 0.3229790925979614,
365
+ "learning_rate": 2.9518866105290227e-05,
366
+ "loss": 0.1968,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
+ "grad_norm": 0.37397509813308716,
372
+ "learning_rate": 2.9117117696213926e-05,
373
+ "loss": 0.194,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
+ "grad_norm": 0.33143311738967896,
379
  "learning_rate": 2.871536928713762e-05,
380
+ "loss": 0.1875,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
+ "grad_norm": 0.2748125493526459,
386
  "learning_rate": 2.8313620878061327e-05,
387
+ "loss": 0.1854,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
+ "grad_norm": 0.2606910169124603,
393
  "learning_rate": 2.7911872468985022e-05,
394
+ "loss": 0.1809,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
+ "grad_norm": 0.28182655572891235,
400
  "learning_rate": 2.7510124059908728e-05,
401
+ "loss": 0.1815,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
+ "grad_norm": 0.3056446313858032,
407
+ "learning_rate": 2.7109179147650576e-05,
408
+ "loss": 0.1775,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
+ "grad_norm": 0.2458430379629135,
414
  "learning_rate": 2.6707430738574275e-05,
415
+ "loss": 0.1714,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
+ "grad_norm": 0.2681204080581665,
421
  "learning_rate": 2.6305682329497977e-05,
422
+ "loss": 0.1734,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
+ "grad_norm": 0.38170355558395386,
428
  "learning_rate": 2.5903933920421676e-05,
429
+ "loss": 0.1701,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
+ "grad_norm": 0.43841251730918884,
435
+ "learning_rate": 2.550298900816353e-05,
436
+ "loss": 0.1656,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
+ "grad_norm": 0.4082754850387573,
442
  "learning_rate": 2.510124059908723e-05,
443
+ "loss": 0.1649,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
+ "grad_norm": 0.27510714530944824,
449
  "learning_rate": 2.4699492190010928e-05,
450
+ "loss": 0.1636,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
+ "grad_norm": 0.3550429344177246,
456
  "learning_rate": 2.429774378093463e-05,
457
+ "loss": 0.1615,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
+ "grad_norm": 0.382055401802063,
463
  "learning_rate": 2.389599537185833e-05,
464
+ "loss": 0.1597,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
+ "grad_norm": 0.38698843121528625,
470
+ "learning_rate": 2.349424696278203e-05,
471
+ "loss": 0.155,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
+ "grad_norm": 0.380403995513916,
477
+ "learning_rate": 2.309249855370573e-05,
478
+ "loss": 0.1594,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
+ "grad_norm": 0.17210371792316437,
484
  "learning_rate": 2.269155364144758e-05,
485
+ "loss": 0.1543,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
+ "grad_norm": 0.33378392457962036,
491
  "learning_rate": 2.228980523237128e-05,
492
+ "loss": 0.1549,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
+ "grad_norm": 0.282175213098526,
498
+ "learning_rate": 2.1888056823294982e-05,
499
+ "loss": 0.1509,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
+ "grad_norm": 0.4829972982406616,
505
+ "learning_rate": 2.148630841421868e-05,
506
+ "loss": 0.1508,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
+ "grad_norm": 0.4101378321647644,
512
+ "learning_rate": 2.1084560005142383e-05,
513
+ "loss": 0.1487,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
+ "grad_norm": 0.24467173218727112,
519
+ "learning_rate": 2.0682811596066082e-05,
520
+ "loss": 0.1482,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
+ "grad_norm": 0.2552469074726105,
526
+ "learning_rate": 2.028106318698978e-05,
527
+ "loss": 0.1474,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
+ "grad_norm": 0.33155035972595215,
533
+ "learning_rate": 1.987931477791348e-05,
534
+ "loss": 0.1427,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
+ "grad_norm": 0.41133707761764526,
540
+ "learning_rate": 1.9478369865655334e-05,
541
+ "loss": 0.143,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
+ "grad_norm": 0.36144211888313293,
547
+ "learning_rate": 1.9076621456579033e-05,
548
+ "loss": 0.1387,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
+ "grad_norm": 0.36597776412963867,
554
+ "learning_rate": 1.8674873047502732e-05,
555
+ "loss": 0.1415,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
+ "grad_norm": 0.37640953063964844,
561
+ "learning_rate": 1.8273124638426434e-05,
562
+ "loss": 0.1408,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
+ "grad_norm": 0.22886815667152405,
568
  "learning_rate": 1.7872983222986438e-05,
569
+ "loss": 0.1366,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
+ "grad_norm": 0.44980695843696594,
575
  "learning_rate": 1.7471234813910137e-05,
576
+ "loss": 0.1411,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
+ "grad_norm": 0.46285852789878845,
582
+ "learning_rate": 1.706948640483384e-05,
583
+ "loss": 0.1367,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
+ "grad_norm": 0.1757335215806961,
589
+ "learning_rate": 1.6667737995757538e-05,
590
+ "loss": 0.1361,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
+ "grad_norm": 0.28056710958480835,
596
+ "learning_rate": 1.6265989586681236e-05,
597
+ "loss": 0.1371,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
+ "grad_norm": 0.4234681725502014,
603
+ "learning_rate": 1.586424117760494e-05,
604
+ "loss": 0.1363,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
+ "grad_norm": 0.2925218641757965,
610
+ "learning_rate": 1.5462492768528637e-05,
611
+ "loss": 0.1336,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
+ "grad_norm": 0.23110254108905792,
617
+ "learning_rate": 1.5060744359452336e-05,
618
+ "loss": 0.1305,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
+ "grad_norm": 0.4187003970146179,
624
+ "learning_rate": 1.4659799447194189e-05,
625
+ "loss": 0.1374,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
+ "grad_norm": 0.30868059396743774,
631
+ "learning_rate": 1.425805103811789e-05,
632
+ "loss": 0.1332,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
+ "grad_norm": 0.24373352527618408,
638
+ "learning_rate": 1.385630262904159e-05,
639
+ "loss": 0.133,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
+ "grad_norm": 0.3976458013057709,
645
+ "learning_rate": 1.345455421996529e-05,
646
+ "loss": 0.1317,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
+ "grad_norm": 0.15130922198295593,
652
+ "learning_rate": 1.3053609307707144e-05,
653
+ "loss": 0.1294,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
+ "grad_norm": 0.26361921429634094,
659
  "learning_rate": 1.2652664395448997e-05,
660
+ "loss": 0.1316,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
+ "grad_norm": 0.3039293587207794,
666
  "learning_rate": 1.2250915986372695e-05,
667
+ "loss": 0.1294,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
+ "grad_norm": 0.23085398972034454,
673
  "learning_rate": 1.1849167577296394e-05,
674
+ "loss": 0.1304,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
+ "grad_norm": 0.45066356658935547,
680
+ "learning_rate": 1.1447419168220095e-05,
681
+ "loss": 0.1283,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
+ "grad_norm": 0.2428194135427475,
687
+ "learning_rate": 1.1045670759143795e-05,
688
+ "loss": 0.1279,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
+ "grad_norm": 0.15587645769119263,
694
+ "learning_rate": 1.0643922350067494e-05,
695
+ "loss": 0.1273,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
+ "grad_norm": 0.5055563449859619,
701
  "learning_rate": 1.0242977437809347e-05,
702
+ "loss": 0.127,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
+ "grad_norm": 0.31220686435699463,
708
+ "learning_rate": 9.841229028733047e-06,
709
+ "loss": 0.1284,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
+ "grad_norm": 0.3776426613330841,
715
+ "learning_rate": 9.439480619656748e-06,
716
+ "loss": 0.1251,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
+ "grad_norm": 0.2834898829460144,
722
+ "learning_rate": 9.037732210580447e-06,
723
+ "loss": 0.1226,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
+ "grad_norm": 0.2295331507921219,
729
+ "learning_rate": 8.635983801504147e-06,
730
+ "loss": 0.1233,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
+ "grad_norm": 0.22921015322208405,
736
+ "learning_rate": 8.234235392427848e-06,
737
+ "loss": 0.1256,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
+ "grad_norm": 0.3294677138328552,
743
+ "learning_rate": 7.832486983351546e-06,
744
+ "loss": 0.1257,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
+ "grad_norm": 0.21186766028404236,
750
+ "learning_rate": 7.430738574275246e-06,
751
+ "loss": 0.1254,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
+ "grad_norm": 0.43346577882766724,
757
+ "learning_rate": 7.029793662017099e-06,
758
+ "loss": 0.1228,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
+ "grad_norm": 0.20274986326694489,
764
+ "learning_rate": 6.628045252940798e-06,
765
+ "loss": 0.124,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
+ "grad_norm": 0.2912587523460388,
771
+ "learning_rate": 6.2262968438644984e-06,
772
+ "loss": 0.1236,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
+ "grad_norm": 0.5663316249847412,
778
+ "learning_rate": 5.824548434788198e-06,
779
+ "loss": 0.1236,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
+ "grad_norm": 0.2563399076461792,
785
+ "learning_rate": 5.423603522530051e-06,
786
+ "loss": 0.1241,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
+ "grad_norm": 0.26923516392707825,
792
  "learning_rate": 5.022658610271903e-06,
793
+ "loss": 0.1231,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
+ "grad_norm": 0.15516141057014465,
799
  "learning_rate": 4.620910201195604e-06,
800
+ "loss": 0.1225,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
+ "grad_norm": 0.1603991985321045,
806
  "learning_rate": 4.219161792119303e-06,
807
+ "loss": 0.1236,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
+ "grad_norm": 0.3031301498413086,
813
  "learning_rate": 3.817413383043003e-06,
814
+ "loss": 0.124,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
+ "grad_norm": 0.25160399079322815,
820
+ "learning_rate": 3.4156649739667035e-06,
821
+ "loss": 0.1212,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
+ "grad_norm": 0.23327353596687317,
827
+ "learning_rate": 3.013916564890403e-06,
828
+ "loss": 0.1199,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
+ "grad_norm": 0.23530858755111694,
834
  "learning_rate": 2.6129716526322558e-06,
835
+ "loss": 0.1228,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
+ "grad_norm": 0.20596709847450256,
841
  "learning_rate": 2.211223243555956e-06,
842
+ "loss": 0.1205,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
+ "grad_norm": 0.35043200850486755,
848
+ "learning_rate": 1.8094748344796555e-06,
849
+ "loss": 0.1188,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
+ "grad_norm": 0.21463052928447723,
855
+ "learning_rate": 1.4077264254033555e-06,
856
+ "loss": 0.1225,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
+ "grad_norm": 0.27506574988365173,
862
+ "learning_rate": 1.0059780163270554e-06,
863
+ "loss": 0.1233,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
+ "grad_norm": 0.3260590732097626,
869
+ "learning_rate": 6.042296072507553e-07,
870
+ "loss": 0.1218,
871
  "step": 61500
872
  }
873
  ],
 
888
  "attributes": {}
889
  }
890
  },
891
+ "total_flos": 1.3317606196484506e+17,
892
  "train_batch_size": 32,
893
  "trial_name": null,
894
  "trial_params": null
checkpoints/checkpoint-62000/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:21d2693f103a4b1c9756dd65ffa057b16bedd52438d1857e7abe6cc6bbbdc118
3
  size 242041896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c7cd5c5562af41ebf3f5bad6e78dc0b1d978e2c1f8c79e0b2c98beaa577e8f6
3
  size 242041896
checkpoints/checkpoint-62000/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d67c69f9573de5a8dadf16db4aeb6d985098267281696ebb224c45bc379b1125
3
  size 484163514
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:984de0c937765020e3882843f25bf26f9e754733e744a0cab794a18a0cd30d19
3
  size 484163514
checkpoints/checkpoint-62000/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:19b09152c000e511e6ddafcdda96b8df75e3964a1a6d6a59502f1c6b09d5600b
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dbe2b6f46d4bd694697f8d7b585c0657854163b4c972d02503f1f6dca716d22
3
  size 14244
checkpoints/checkpoint-62000/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:654ff04f86d41e38c4c564123d36adbd0e83e01bd73996a4587ead262cae63cb
3
  size 988
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6892b9a38efebaa7ab1728c9d3e9fb9fb0ccf8b1a48d0799894268d503230ff0
3
  size 988
checkpoints/checkpoint-62000/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7785e8be4cba2ff2308ebbf74c247c1df9ee9867422dbbc081226579438090ce
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7954ecc6645e020e139b5a94e047d4a69048490e85d681c5cfdcc07828c41a06
3
  size 1064
checkpoints/checkpoint-62000/special_tokens_map.json CHANGED
@@ -101,6 +101,13 @@
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
 
 
 
 
 
 
 
104
  "eos_token": {
105
  "content": "</s>",
106
  "lstrip": false,
 
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
104
+ "bos_token": {
105
+ "content": "<s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
  "eos_token": {
112
  "content": "</s>",
113
  "lstrip": false,
checkpoints/checkpoint-62000/tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-62000/tokenizer_config.json CHANGED
@@ -10,7 +10,7 @@
10
  "special": true
11
  },
12
  "1": {
13
- "content": "</s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
@@ -18,414 +18,414 @@
18
  "special": true
19
  },
20
  "2": {
21
- "content": "<unk>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
- "32000": {
29
- "content": "<extra_id_99>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
- "32001": {
37
- "content": "<extra_id_98>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
- "32002": {
45
- "content": "<extra_id_97>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
- "32003": {
53
- "content": "<extra_id_96>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
- "32004": {
61
- "content": "<extra_id_95>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
- "32005": {
69
- "content": "<extra_id_94>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
- "32006": {
77
- "content": "<extra_id_93>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
- "32007": {
85
- "content": "<extra_id_92>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
- "32008": {
93
- "content": "<extra_id_91>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
- "32009": {
101
- "content": "<extra_id_90>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
- "32010": {
109
- "content": "<extra_id_89>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
- "32011": {
117
- "content": "<extra_id_88>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
- "32012": {
125
- "content": "<extra_id_87>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
- "32013": {
133
- "content": "<extra_id_86>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
- "32014": {
141
- "content": "<extra_id_85>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
- "32015": {
149
- "content": "<extra_id_84>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
- "32016": {
157
- "content": "<extra_id_83>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
- "32017": {
165
- "content": "<extra_id_82>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
- "32018": {
173
- "content": "<extra_id_81>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
- "32019": {
181
- "content": "<extra_id_80>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
- "32020": {
189
- "content": "<extra_id_79>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
- "32021": {
197
- "content": "<extra_id_78>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
- "32022": {
205
- "content": "<extra_id_77>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
- "32023": {
213
- "content": "<extra_id_76>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
- "32024": {
221
- "content": "<extra_id_75>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
- "32025": {
229
- "content": "<extra_id_74>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
- "32026": {
237
- "content": "<extra_id_73>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
- "32027": {
245
- "content": "<extra_id_72>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
- "32028": {
253
- "content": "<extra_id_71>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
- "32029": {
261
- "content": "<extra_id_70>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
- "32030": {
269
- "content": "<extra_id_69>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
- "32031": {
277
- "content": "<extra_id_68>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
- "32032": {
285
- "content": "<extra_id_67>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
- "32033": {
293
- "content": "<extra_id_66>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
- "32034": {
301
- "content": "<extra_id_65>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
- "32035": {
309
- "content": "<extra_id_64>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
- "32036": {
317
- "content": "<extra_id_63>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
- "32037": {
325
- "content": "<extra_id_62>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
- "32038": {
333
- "content": "<extra_id_61>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
- "32039": {
341
- "content": "<extra_id_60>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
- "32040": {
349
- "content": "<extra_id_59>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
- "32041": {
357
- "content": "<extra_id_58>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
- "32042": {
365
- "content": "<extra_id_57>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
- "32043": {
373
- "content": "<extra_id_56>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
- "32044": {
381
- "content": "<extra_id_55>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
- "32045": {
389
- "content": "<extra_id_54>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
- "32046": {
397
- "content": "<extra_id_53>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
- "32047": {
405
- "content": "<extra_id_52>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
- "32048": {
413
- "content": "<extra_id_51>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
- "32049": {
421
- "content": "<extra_id_50>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
- "32050": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
@@ -433,392 +433,400 @@
433
  "single_word": false,
434
  "special": true
435
  },
436
- "32051": {
437
- "content": "<extra_id_48>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
- "32052": {
445
- "content": "<extra_id_47>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
- "32053": {
453
- "content": "<extra_id_46>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
- "32054": {
461
- "content": "<extra_id_45>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
- "32055": {
469
- "content": "<extra_id_44>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
- "32056": {
477
- "content": "<extra_id_43>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
- "32057": {
485
- "content": "<extra_id_42>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
- "32058": {
493
- "content": "<extra_id_41>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
- "32059": {
501
- "content": "<extra_id_40>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
- "32060": {
509
- "content": "<extra_id_39>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
- "32061": {
517
- "content": "<extra_id_38>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
- "32062": {
525
- "content": "<extra_id_37>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
- "32063": {
533
- "content": "<extra_id_36>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
- "32064": {
541
- "content": "<extra_id_35>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
- "32065": {
549
- "content": "<extra_id_34>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
- "32066": {
557
- "content": "<extra_id_33>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
- "32067": {
565
- "content": "<extra_id_32>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
- "32068": {
573
- "content": "<extra_id_31>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
- "32069": {
581
- "content": "<extra_id_30>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
- "32070": {
589
- "content": "<extra_id_29>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
- "32071": {
597
- "content": "<extra_id_28>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
- "32072": {
605
- "content": "<extra_id_27>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
- "32073": {
613
- "content": "<extra_id_26>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
- "32074": {
621
- "content": "<extra_id_25>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
- "32075": {
629
- "content": "<extra_id_24>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
- "32076": {
637
- "content": "<extra_id_23>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
- "32077": {
645
- "content": "<extra_id_22>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
- "32078": {
653
- "content": "<extra_id_21>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
- "32079": {
661
- "content": "<extra_id_20>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
- "32080": {
669
- "content": "<extra_id_19>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
- "32081": {
677
- "content": "<extra_id_18>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
- "32082": {
685
- "content": "<extra_id_17>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
- "32083": {
693
- "content": "<extra_id_16>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
- "32084": {
701
- "content": "<extra_id_15>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
- "32085": {
709
- "content": "<extra_id_14>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
- "32086": {
717
- "content": "<extra_id_13>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
- "32087": {
725
- "content": "<extra_id_12>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
- "32088": {
733
- "content": "<extra_id_11>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
- "32089": {
741
- "content": "<extra_id_10>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
- "32090": {
749
- "content": "<extra_id_9>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
- "32091": {
757
- "content": "<extra_id_8>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
- "32092": {
765
- "content": "<extra_id_7>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
- "32093": {
773
- "content": "<extra_id_6>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
- "32094": {
781
- "content": "<extra_id_5>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
- "32095": {
789
- "content": "<extra_id_4>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
- "32096": {
797
- "content": "<extra_id_3>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
- "32097": {
805
- "content": "<extra_id_2>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
- "32098": {
813
- "content": "<extra_id_1>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
- "32099": {
821
- "content": "<extra_id_0>",
 
 
 
 
 
 
 
 
822
  "lstrip": false,
823
  "normalized": false,
824
  "rstrip": false,
@@ -928,12 +936,13 @@
928
  "<extra_id_98>",
929
  "<extra_id_99>"
930
  ],
931
- "clean_up_tokenization_spaces": true,
 
932
  "eos_token": "</s>",
933
  "extra_ids": 100,
934
  "extra_special_tokens": {},
935
- "model_max_length": 512,
936
  "pad_token": "<pad>",
937
- "tokenizer_class": "T5Tokenizer",
938
  "unk_token": "<unk>"
939
  }
 
10
  "special": true
11
  },
12
  "1": {
13
+ "content": "<s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
 
18
  "special": true
19
  },
20
  "2": {
21
+ "content": "</s>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
+ "3": {
29
+ "content": "<unk>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
+ "8000": {
37
+ "content": "<extra_id_0>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
+ "8001": {
45
+ "content": "<extra_id_1>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
+ "8002": {
53
+ "content": "<extra_id_2>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
+ "8003": {
61
+ "content": "<extra_id_3>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
+ "8004": {
69
+ "content": "<extra_id_4>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
+ "8005": {
77
+ "content": "<extra_id_5>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
+ "8006": {
85
+ "content": "<extra_id_6>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
+ "8007": {
93
+ "content": "<extra_id_7>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
+ "8008": {
101
+ "content": "<extra_id_8>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
+ "8009": {
109
+ "content": "<extra_id_9>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
+ "8010": {
117
+ "content": "<extra_id_10>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
+ "8011": {
125
+ "content": "<extra_id_11>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
+ "8012": {
133
+ "content": "<extra_id_12>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
+ "8013": {
141
+ "content": "<extra_id_13>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
+ "8014": {
149
+ "content": "<extra_id_14>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
+ "8015": {
157
+ "content": "<extra_id_15>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
+ "8016": {
165
+ "content": "<extra_id_16>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
+ "8017": {
173
+ "content": "<extra_id_17>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
+ "8018": {
181
+ "content": "<extra_id_18>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
+ "8019": {
189
+ "content": "<extra_id_19>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
+ "8020": {
197
+ "content": "<extra_id_20>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
+ "8021": {
205
+ "content": "<extra_id_21>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
+ "8022": {
213
+ "content": "<extra_id_22>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
+ "8023": {
221
+ "content": "<extra_id_23>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
+ "8024": {
229
+ "content": "<extra_id_24>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
+ "8025": {
237
+ "content": "<extra_id_25>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
+ "8026": {
245
+ "content": "<extra_id_26>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
+ "8027": {
253
+ "content": "<extra_id_27>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
+ "8028": {
261
+ "content": "<extra_id_28>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
+ "8029": {
269
+ "content": "<extra_id_29>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
+ "8030": {
277
+ "content": "<extra_id_30>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
+ "8031": {
285
+ "content": "<extra_id_31>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
+ "8032": {
293
+ "content": "<extra_id_32>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
+ "8033": {
301
+ "content": "<extra_id_33>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
+ "8034": {
309
+ "content": "<extra_id_34>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
+ "8035": {
317
+ "content": "<extra_id_35>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
+ "8036": {
325
+ "content": "<extra_id_36>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
+ "8037": {
333
+ "content": "<extra_id_37>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
+ "8038": {
341
+ "content": "<extra_id_38>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
+ "8039": {
349
+ "content": "<extra_id_39>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
+ "8040": {
357
+ "content": "<extra_id_40>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
+ "8041": {
365
+ "content": "<extra_id_41>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
+ "8042": {
373
+ "content": "<extra_id_42>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
+ "8043": {
381
+ "content": "<extra_id_43>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
+ "8044": {
389
+ "content": "<extra_id_44>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
+ "8045": {
397
+ "content": "<extra_id_45>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
+ "8046": {
405
+ "content": "<extra_id_46>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
+ "8047": {
413
+ "content": "<extra_id_47>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
+ "8048": {
421
+ "content": "<extra_id_48>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
+ "8049": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
 
433
  "single_word": false,
434
  "special": true
435
  },
436
+ "8050": {
437
+ "content": "<extra_id_50>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
+ "8051": {
445
+ "content": "<extra_id_51>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
+ "8052": {
453
+ "content": "<extra_id_52>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
+ "8053": {
461
+ "content": "<extra_id_53>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
+ "8054": {
469
+ "content": "<extra_id_54>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
+ "8055": {
477
+ "content": "<extra_id_55>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
+ "8056": {
485
+ "content": "<extra_id_56>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
+ "8057": {
493
+ "content": "<extra_id_57>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
+ "8058": {
501
+ "content": "<extra_id_58>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
+ "8059": {
509
+ "content": "<extra_id_59>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
+ "8060": {
517
+ "content": "<extra_id_60>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
+ "8061": {
525
+ "content": "<extra_id_61>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
+ "8062": {
533
+ "content": "<extra_id_62>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
+ "8063": {
541
+ "content": "<extra_id_63>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
+ "8064": {
549
+ "content": "<extra_id_64>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
+ "8065": {
557
+ "content": "<extra_id_65>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
+ "8066": {
565
+ "content": "<extra_id_66>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
+ "8067": {
573
+ "content": "<extra_id_67>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
+ "8068": {
581
+ "content": "<extra_id_68>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
+ "8069": {
589
+ "content": "<extra_id_69>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
+ "8070": {
597
+ "content": "<extra_id_70>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
+ "8071": {
605
+ "content": "<extra_id_71>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
+ "8072": {
613
+ "content": "<extra_id_72>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
+ "8073": {
621
+ "content": "<extra_id_73>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
+ "8074": {
629
+ "content": "<extra_id_74>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
+ "8075": {
637
+ "content": "<extra_id_75>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
+ "8076": {
645
+ "content": "<extra_id_76>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
+ "8077": {
653
+ "content": "<extra_id_77>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
+ "8078": {
661
+ "content": "<extra_id_78>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
+ "8079": {
669
+ "content": "<extra_id_79>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
+ "8080": {
677
+ "content": "<extra_id_80>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
+ "8081": {
685
+ "content": "<extra_id_81>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
+ "8082": {
693
+ "content": "<extra_id_82>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
+ "8083": {
701
+ "content": "<extra_id_83>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
+ "8084": {
709
+ "content": "<extra_id_84>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
+ "8085": {
717
+ "content": "<extra_id_85>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
+ "8086": {
725
+ "content": "<extra_id_86>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
+ "8087": {
733
+ "content": "<extra_id_87>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
+ "8088": {
741
+ "content": "<extra_id_88>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
+ "8089": {
749
+ "content": "<extra_id_89>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
+ "8090": {
757
+ "content": "<extra_id_90>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
+ "8091": {
765
+ "content": "<extra_id_91>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
+ "8092": {
773
+ "content": "<extra_id_92>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
+ "8093": {
781
+ "content": "<extra_id_93>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
+ "8094": {
789
+ "content": "<extra_id_94>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
+ "8095": {
797
+ "content": "<extra_id_95>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
+ "8096": {
805
+ "content": "<extra_id_96>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
+ "8097": {
813
+ "content": "<extra_id_97>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
+ "8098": {
821
+ "content": "<extra_id_98>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "8099": {
829
+ "content": "<extra_id_99>",
830
  "lstrip": false,
831
  "normalized": false,
832
  "rstrip": false,
 
936
  "<extra_id_98>",
937
  "<extra_id_99>"
938
  ],
939
+ "bos_token": "<s>",
940
+ "clean_up_tokenization_spaces": false,
941
  "eos_token": "</s>",
942
  "extra_ids": 100,
943
  "extra_special_tokens": {},
944
+ "model_max_length": 1000000000000000019884624838656,
945
  "pad_token": "<pad>",
946
+ "tokenizer_class": "T5TokenizerFast",
947
  "unk_token": "<unk>"
948
  }
checkpoints/checkpoint-62000/trainer_state.json CHANGED
@@ -11,870 +11,870 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
- "grad_norm": 0.3969729542732239,
15
- "learning_rate": 4.960146557819631e-05,
16
- "loss": 2.05,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
- "grad_norm": 0.3822907507419586,
22
- "learning_rate": 4.919971716912001e-05,
23
- "loss": 1.1207,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
- "grad_norm": 0.36019280552864075,
29
- "learning_rate": 4.879796876004371e-05,
30
- "loss": 0.9225,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
- "grad_norm": 0.30364033579826355,
36
- "learning_rate": 4.8396220350967415e-05,
37
- "loss": 0.8244,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
- "grad_norm": 0.45634394884109497,
43
- "learning_rate": 4.799447194189111e-05,
44
- "loss": 0.7506,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
- "grad_norm": 0.3562425374984741,
50
- "learning_rate": 4.759272353281481e-05,
51
- "loss": 0.7012,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
- "grad_norm": 0.33726808428764343,
57
- "learning_rate": 4.719097512373851e-05,
58
- "loss": 0.6706,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
- "grad_norm": 0.30098849534988403,
64
- "learning_rate": 4.678922671466221e-05,
65
- "loss": 0.6308,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
- "grad_norm": 0.29443585872650146,
71
- "learning_rate": 4.6387478305585915e-05,
72
- "loss": 0.6141,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
- "grad_norm": 0.25647810101509094,
78
- "learning_rate": 4.598572989650961e-05,
79
- "loss": 0.5866,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
- "grad_norm": 0.2516370415687561,
85
- "learning_rate": 4.558398148743331e-05,
86
- "loss": 0.5665,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
- "grad_norm": 0.3337278366088867,
92
- "learning_rate": 4.518223307835701e-05,
93
- "loss": 0.5427,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
- "grad_norm": 0.2592964470386505,
99
- "learning_rate": 4.478048466928072e-05,
100
- "loss": 0.5323,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
- "grad_norm": 0.28550606966018677,
106
- "learning_rate": 4.437873626020441e-05,
107
- "loss": 0.5187,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
- "grad_norm": 0.26474013924598694,
113
- "learning_rate": 4.397698785112811e-05,
114
- "loss": 0.5058,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
- "grad_norm": 0.3018198013305664,
120
- "learning_rate": 4.3575239442051814e-05,
121
- "loss": 0.5013,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
- "grad_norm": 0.2628585994243622,
127
- "learning_rate": 4.317349103297551e-05,
128
- "loss": 0.4883,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
- "grad_norm": 0.30172979831695557,
134
- "learning_rate": 4.277174262389921e-05,
135
- "loss": 0.4795,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
- "grad_norm": 0.25293004512786865,
141
- "learning_rate": 4.236999421482291e-05,
142
- "loss": 0.4682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
- "grad_norm": 0.2726214528083801,
148
- "learning_rate": 4.196824580574661e-05,
149
- "loss": 0.4641,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
- "grad_norm": 0.2570224106311798,
155
- "learning_rate": 4.1566497396670314e-05,
156
- "loss": 0.4556,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
- "grad_norm": 0.26380738615989685,
162
- "learning_rate": 4.1164748987594006e-05,
163
- "loss": 0.449,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
- "grad_norm": 0.2555176913738251,
169
- "learning_rate": 4.076300057851771e-05,
170
- "loss": 0.4412,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
- "grad_norm": 0.2122594565153122,
176
- "learning_rate": 4.036125216944141e-05,
177
- "loss": 0.4365,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
- "grad_norm": 0.2333071529865265,
183
- "learning_rate": 3.9959503760365116e-05,
184
- "loss": 0.433,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
- "grad_norm": 0.24873752892017365,
190
- "learning_rate": 3.955775535128881e-05,
191
- "loss": 0.4283,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
- "grad_norm": 0.32416871190071106,
197
- "learning_rate": 3.915600694221251e-05,
198
- "loss": 0.4218,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
- "grad_norm": 0.23515433073043823,
204
- "learning_rate": 3.875425853313621e-05,
205
- "loss": 0.4139,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
- "grad_norm": 0.22002151608467102,
211
- "learning_rate": 3.8353313620878064e-05,
212
- "loss": 0.417,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
- "grad_norm": 0.251897931098938,
218
- "learning_rate": 3.795156521180176e-05,
219
- "loss": 0.4106,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
- "grad_norm": 0.26212435960769653,
225
- "learning_rate": 3.754981680272546e-05,
226
- "loss": 0.4037,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
- "grad_norm": 0.2718159258365631,
232
  "learning_rate": 3.714887189046731e-05,
233
- "loss": 0.402,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
- "grad_norm": 0.23812739551067352,
239
- "learning_rate": 3.674712348139102e-05,
240
- "loss": 0.3953,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
- "grad_norm": 0.21076083183288574,
246
- "learning_rate": 3.634537507231471e-05,
247
- "loss": 0.3938,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
- "grad_norm": 0.25489869713783264,
253
- "learning_rate": 3.5943626663238416e-05,
254
- "loss": 0.3921,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
- "grad_norm": 0.24057357013225555,
260
- "learning_rate": 3.5541878254162115e-05,
261
- "loss": 0.3867,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
- "grad_norm": 0.24298915266990662,
267
- "learning_rate": 3.514012984508582e-05,
268
- "loss": 0.3868,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
- "grad_norm": 0.2183919996023178,
274
- "learning_rate": 3.473838143600951e-05,
275
- "loss": 0.3803,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
- "grad_norm": 0.2278251349925995,
281
- "learning_rate": 3.433663302693321e-05,
282
- "loss": 0.3775,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
- "grad_norm": 0.240201935172081,
288
- "learning_rate": 3.393568811467507e-05,
289
- "loss": 0.3751,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
- "grad_norm": 0.21118561923503876,
295
- "learning_rate": 3.353393970559877e-05,
296
- "loss": 0.3742,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
- "grad_norm": 0.22640825808048248,
302
- "learning_rate": 3.313219129652247e-05,
303
- "loss": 0.3729,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
- "grad_norm": 0.23105542361736298,
309
- "learning_rate": 3.2730442887446166e-05,
310
- "loss": 0.3687,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
- "grad_norm": 0.24791008234024048,
316
- "learning_rate": 3.2329497975188024e-05,
317
- "loss": 0.3658,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
- "grad_norm": 0.2497881054878235,
323
- "learning_rate": 3.1928553062929875e-05,
324
- "loss": 0.3646,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
- "grad_norm": 0.2395261973142624,
330
- "learning_rate": 3.152680465385357e-05,
331
- "loss": 0.3655,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
- "grad_norm": 0.21194589138031006,
337
- "learning_rate": 3.112505624477727e-05,
338
- "loss": 0.3646,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
- "grad_norm": 0.21682508289813995,
344
- "learning_rate": 3.072330783570097e-05,
345
- "loss": 0.3629,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
- "grad_norm": 0.23710566759109497,
351
- "learning_rate": 3.0321559426624674e-05,
352
- "loss": 0.3583,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
- "grad_norm": 0.23857219517230988,
358
- "learning_rate": 2.9919811017548372e-05,
359
- "loss": 0.3561,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
- "grad_norm": 0.241951584815979,
365
- "learning_rate": 2.9518062608472075e-05,
366
- "loss": 0.3537,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
- "grad_norm": 0.275765061378479,
372
- "learning_rate": 2.9116314199395773e-05,
373
- "loss": 0.3493,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
- "grad_norm": 0.24757184088230133,
379
  "learning_rate": 2.871536928713762e-05,
380
- "loss": 0.3486,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
- "grad_norm": 0.21833688020706177,
386
  "learning_rate": 2.8313620878061327e-05,
387
- "loss": 0.3461,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
- "grad_norm": 0.21623168885707855,
393
  "learning_rate": 2.7911872468985022e-05,
394
- "loss": 0.3468,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
- "grad_norm": 0.20861521363258362,
400
  "learning_rate": 2.7510124059908728e-05,
401
- "loss": 0.3481,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
- "grad_norm": 0.20291315019130707,
407
- "learning_rate": 2.7108375650832423e-05,
408
- "loss": 0.3474,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
- "grad_norm": 0.2101660966873169,
414
  "learning_rate": 2.6707430738574275e-05,
415
- "loss": 0.3412,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
- "grad_norm": 0.23224739730358124,
421
  "learning_rate": 2.6305682329497977e-05,
422
- "loss": 0.3422,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
- "grad_norm": 0.22987599670886993,
428
  "learning_rate": 2.5903933920421676e-05,
429
- "loss": 0.3407,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
- "grad_norm": 0.22307533025741577,
435
- "learning_rate": 2.5502185511345378e-05,
436
- "loss": 0.3365,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
- "grad_norm": 0.20577801764011383,
442
  "learning_rate": 2.510124059908723e-05,
443
- "loss": 0.3409,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
- "grad_norm": 0.23968417942523956,
449
  "learning_rate": 2.4699492190010928e-05,
450
- "loss": 0.339,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
- "grad_norm": 0.2166174054145813,
456
  "learning_rate": 2.429774378093463e-05,
457
- "loss": 0.3317,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
- "grad_norm": 0.22259151935577393,
463
  "learning_rate": 2.389599537185833e-05,
464
- "loss": 0.3404,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
- "grad_norm": 0.2585219442844391,
470
- "learning_rate": 2.3495050459600184e-05,
471
- "loss": 0.3322,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
- "grad_norm": 0.23949937522411346,
477
- "learning_rate": 2.3093302050523882e-05,
478
- "loss": 0.3332,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
- "grad_norm": 0.2360944151878357,
484
  "learning_rate": 2.269155364144758e-05,
485
- "loss": 0.3374,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
- "grad_norm": 0.23383018374443054,
491
  "learning_rate": 2.228980523237128e-05,
492
- "loss": 0.3287,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
- "grad_norm": 0.25602060556411743,
498
- "learning_rate": 2.1888860320113135e-05,
499
- "loss": 0.3262,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
- "grad_norm": 0.2233658730983734,
505
- "learning_rate": 2.1487111911036833e-05,
506
- "loss": 0.3294,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
- "grad_norm": 0.23545712232589722,
512
- "learning_rate": 2.1085363501960532e-05,
513
- "loss": 0.3263,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
- "grad_norm": 0.22479598224163055,
519
- "learning_rate": 2.0683615092884234e-05,
520
- "loss": 0.328,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
- "grad_norm": 0.22207121551036835,
526
- "learning_rate": 2.0282670180626086e-05,
527
- "loss": 0.3275,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
- "grad_norm": 0.23822110891342163,
533
- "learning_rate": 1.9880921771549785e-05,
534
- "loss": 0.3273,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
- "grad_norm": 0.23664866387844086,
540
- "learning_rate": 1.9479173362473487e-05,
541
- "loss": 0.318,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
- "grad_norm": 0.18543508648872375,
547
- "learning_rate": 1.9077424953397185e-05,
548
- "loss": 0.3235,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
- "grad_norm": 0.23305822908878326,
554
- "learning_rate": 1.8676480041139037e-05,
555
- "loss": 0.3243,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
- "grad_norm": 0.21699073910713196,
561
- "learning_rate": 1.827473163206274e-05,
562
- "loss": 0.3222,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
- "grad_norm": 0.2757895588874817,
568
  "learning_rate": 1.7872983222986438e-05,
569
- "loss": 0.3248,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
- "grad_norm": 0.19769324362277985,
575
  "learning_rate": 1.7471234813910137e-05,
576
- "loss": 0.3179,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
- "grad_norm": 0.18964402377605438,
582
- "learning_rate": 1.707028990165199e-05,
583
- "loss": 0.3178,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
- "grad_norm": 0.2584107220172882,
589
- "learning_rate": 1.666854149257569e-05,
590
- "loss": 0.318,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
- "grad_norm": 0.25919750332832336,
596
- "learning_rate": 1.626759658031754e-05,
597
- "loss": 0.3205,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
- "grad_norm": 0.24371759593486786,
603
- "learning_rate": 1.5865848171241244e-05,
604
- "loss": 0.3186,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
- "grad_norm": 0.24457883834838867,
610
- "learning_rate": 1.5464099762164942e-05,
611
- "loss": 0.3162,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
- "grad_norm": 0.1918337345123291,
617
- "learning_rate": 1.5062351353088641e-05,
618
- "loss": 0.3169,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
- "grad_norm": 0.2350657880306244,
624
- "learning_rate": 1.4660602944012342e-05,
625
- "loss": 0.3171,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
- "grad_norm": 0.2481279820203781,
631
- "learning_rate": 1.4258854534936042e-05,
632
- "loss": 0.3179,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
- "grad_norm": 0.21132701635360718,
638
- "learning_rate": 1.3857106125859743e-05,
639
- "loss": 0.3125,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
- "grad_norm": 0.20240716636180878,
645
- "learning_rate": 1.3455357716783443e-05,
646
- "loss": 0.3172,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
- "grad_norm": 0.2224823385477066,
652
- "learning_rate": 1.3054412804525296e-05,
653
- "loss": 0.3151,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
- "grad_norm": 0.19261781871318817,
659
  "learning_rate": 1.2652664395448997e-05,
660
- "loss": 0.312,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
- "grad_norm": 0.16068917512893677,
666
  "learning_rate": 1.2250915986372695e-05,
667
- "loss": 0.3145,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
- "grad_norm": 0.18192972242832184,
673
  "learning_rate": 1.1849167577296394e-05,
674
- "loss": 0.3134,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
- "grad_norm": 0.19884943962097168,
680
- "learning_rate": 1.1448222665038247e-05,
681
- "loss": 0.3119,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
- "grad_norm": 0.1883106529712677,
687
- "learning_rate": 1.1046474255961948e-05,
688
- "loss": 0.316,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
- "grad_norm": 0.19331087172031403,
694
- "learning_rate": 1.0644725846885646e-05,
695
- "loss": 0.3135,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
- "grad_norm": 0.20041531324386597,
701
  "learning_rate": 1.0242977437809347e-05,
702
- "loss": 0.3112,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
- "grad_norm": 0.18530187010765076,
708
- "learning_rate": 9.8420325255512e-06,
709
- "loss": 0.3122,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
- "grad_norm": 0.22725620865821838,
715
- "learning_rate": 9.4402841164749e-06,
716
- "loss": 0.3122,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
- "grad_norm": 0.23093479871749878,
722
- "learning_rate": 9.0385357073986e-06,
723
- "loss": 0.3149,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
- "grad_norm": 0.19580845534801483,
729
- "learning_rate": 8.6367872983223e-06,
730
- "loss": 0.3121,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
- "grad_norm": 0.1742846667766571,
736
- "learning_rate": 8.235842386064153e-06,
737
- "loss": 0.3094,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
- "grad_norm": 0.18685191869735718,
743
- "learning_rate": 7.834093976987852e-06,
744
- "loss": 0.309,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
- "grad_norm": 0.21959276497364044,
750
- "learning_rate": 7.432345567911551e-06,
751
- "loss": 0.3118,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
- "grad_norm": 0.1935770958662033,
757
- "learning_rate": 7.030597158835252e-06,
758
- "loss": 0.3106,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
- "grad_norm": 0.19977129995822906,
764
- "learning_rate": 6.629652246577103e-06,
765
- "loss": 0.3101,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
- "grad_norm": 0.2006288766860962,
771
- "learning_rate": 6.2279038375008035e-06,
772
- "loss": 0.3099,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
- "grad_norm": 0.19280743598937988,
778
- "learning_rate": 5.826155428424504e-06,
779
- "loss": 0.308,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
- "grad_norm": 0.22095157206058502,
785
- "learning_rate": 5.424407019348204e-06,
786
- "loss": 0.3069,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
- "grad_norm": 0.2091740071773529,
792
  "learning_rate": 5.022658610271903e-06,
793
- "loss": 0.3062,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
- "grad_norm": 0.24772244691848755,
799
  "learning_rate": 4.620910201195604e-06,
800
- "loss": 0.3093,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
- "grad_norm": 0.1973961740732193,
806
  "learning_rate": 4.219161792119303e-06,
807
- "loss": 0.309,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
- "grad_norm": 0.22767914831638336,
813
  "learning_rate": 3.817413383043003e-06,
814
- "loss": 0.3109,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
- "grad_norm": 0.21461111307144165,
820
- "learning_rate": 3.416468470784856e-06,
821
- "loss": 0.3075,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
- "grad_norm": 0.24607454240322113,
827
- "learning_rate": 3.0147200617085557e-06,
828
- "loss": 0.3058,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
- "grad_norm": 0.19667118787765503,
834
  "learning_rate": 2.6129716526322558e-06,
835
- "loss": 0.3072,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
- "grad_norm": 0.22604137659072876,
841
  "learning_rate": 2.211223243555956e-06,
842
- "loss": 0.3064,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
- "grad_norm": 0.1879967898130417,
848
- "learning_rate": 1.8102783312978082e-06,
849
- "loss": 0.3063,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
- "grad_norm": 0.21271295845508575,
855
- "learning_rate": 1.408529922221508e-06,
856
- "loss": 0.3076,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
- "grad_norm": 0.16714586317539215,
862
- "learning_rate": 1.006781513145208e-06,
863
- "loss": 0.3092,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
- "grad_norm": 0.20666128396987915,
869
- "learning_rate": 6.050331040689079e-07,
870
- "loss": 0.3076,
871
  "step": 61500
872
  },
873
  {
874
  "epoch": 1.9926721090184483,
875
- "grad_norm": 0.18590718507766724,
876
- "learning_rate": 2.0328469499260785e-07,
877
- "loss": 0.3063,
878
  "step": 62000
879
  }
880
  ],
@@ -895,7 +895,7 @@
895
  "attributes": {}
896
  }
897
  },
898
- "total_flos": 1.3425893171842253e+17,
899
  "train_batch_size": 32,
900
  "trial_name": null,
901
  "trial_params": null
 
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
+ "grad_norm": 0.2569522559642792,
15
+ "learning_rate": 4.960307257183262e-05,
16
+ "loss": 2.9119,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
+ "grad_norm": 0.26731985807418823,
22
+ "learning_rate": 4.9201324162756315e-05,
23
+ "loss": 2.2886,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
+ "grad_norm": 0.3099210560321808,
29
+ "learning_rate": 4.8799575753680014e-05,
30
+ "loss": 2.1431,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
+ "grad_norm": 0.28836730122566223,
36
+ "learning_rate": 4.839782734460372e-05,
37
+ "loss": 2.0369,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
+ "grad_norm": 0.4808545708656311,
43
+ "learning_rate": 4.799607893552742e-05,
44
+ "loss": 1.932,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
+ "grad_norm": 0.38000208139419556,
50
+ "learning_rate": 4.759433052645112e-05,
51
+ "loss": 1.7766,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
+ "grad_norm": 0.4310196340084076,
57
+ "learning_rate": 4.7192582117374816e-05,
58
+ "loss": 1.6022,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
+ "grad_norm": 0.40425005555152893,
64
+ "learning_rate": 4.6790833708298515e-05,
65
+ "loss": 1.4576,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
+ "grad_norm": 0.3811793327331543,
71
+ "learning_rate": 4.638908529922222e-05,
72
+ "loss": 1.3384,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
+ "grad_norm": 0.38943949341773987,
78
+ "learning_rate": 4.598733689014591e-05,
79
+ "loss": 1.2233,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
+ "grad_norm": 0.5517480373382568,
85
+ "learning_rate": 4.558558848106962e-05,
86
+ "loss": 1.1342,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
+ "grad_norm": 0.4235232174396515,
92
+ "learning_rate": 4.518384007199332e-05,
93
+ "loss": 1.0432,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
+ "grad_norm": 0.4617592692375183,
99
+ "learning_rate": 4.478209166291702e-05,
100
+ "loss": 0.9781,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
+ "grad_norm": 0.5447149872779846,
106
+ "learning_rate": 4.4380343253840714e-05,
107
+ "loss": 0.927,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
+ "grad_norm": 0.4740816354751587,
113
+ "learning_rate": 4.397859484476441e-05,
114
+ "loss": 0.8674,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
+ "grad_norm": 0.5207423567771912,
120
+ "learning_rate": 4.357684643568812e-05,
121
+ "loss": 0.8149,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
+ "grad_norm": 0.47738897800445557,
127
+ "learning_rate": 4.317509802661182e-05,
128
+ "loss": 0.7685,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
+ "grad_norm": 0.4176841676235199,
134
+ "learning_rate": 4.2773349617535516e-05,
135
+ "loss": 0.7119,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
+ "grad_norm": 0.381345272064209,
141
+ "learning_rate": 4.2371601208459215e-05,
142
+ "loss": 0.6682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
+ "grad_norm": 0.6301918625831604,
148
+ "learning_rate": 4.1969852799382914e-05,
149
+ "loss": 0.6505,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
+ "grad_norm": 0.4057278335094452,
155
+ "learning_rate": 4.156810439030662e-05,
156
+ "loss": 0.6063,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
+ "grad_norm": 0.5442121624946594,
162
+ "learning_rate": 4.116635598123031e-05,
163
+ "loss": 0.5735,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
+ "grad_norm": 0.5113051533699036,
169
+ "learning_rate": 4.076460757215402e-05,
170
+ "loss": 0.5432,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
+ "grad_norm": 0.6383316516876221,
176
+ "learning_rate": 4.0362859163077716e-05,
177
+ "loss": 0.5143,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
+ "grad_norm": 0.4316321611404419,
183
+ "learning_rate": 3.996111075400142e-05,
184
+ "loss": 0.4867,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
+ "grad_norm": 0.42703017592430115,
190
+ "learning_rate": 3.955936234492511e-05,
191
+ "loss": 0.4614,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
+ "grad_norm": 0.4263227880001068,
197
+ "learning_rate": 3.915761393584881e-05,
198
+ "loss": 0.4391,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
+ "grad_norm": 0.47577473521232605,
204
+ "learning_rate": 3.875586552677252e-05,
205
+ "loss": 0.4241,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
+ "grad_norm": 0.3419073224067688,
211
+ "learning_rate": 3.8354117117696216e-05,
212
+ "loss": 0.4019,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
+ "grad_norm": 0.3402538001537323,
218
+ "learning_rate": 3.7952368708619915e-05,
219
+ "loss": 0.3876,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
+ "grad_norm": 0.7072747349739075,
225
+ "learning_rate": 3.7550620299543614e-05,
226
+ "loss": 0.364,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
+ "grad_norm": 0.31305554509162903,
232
  "learning_rate": 3.714887189046731e-05,
233
+ "loss": 0.3463,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
+ "grad_norm": 0.4203876554965973,
239
+ "learning_rate": 3.674792697820917e-05,
240
+ "loss": 0.3371,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
+ "grad_norm": 0.49149152636528015,
246
+ "learning_rate": 3.634617856913286e-05,
247
+ "loss": 0.3189,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
+ "grad_norm": 0.6438118815422058,
253
+ "learning_rate": 3.594443016005657e-05,
254
+ "loss": 0.3074,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
+ "grad_norm": 0.6619039177894592,
260
+ "learning_rate": 3.554268175098027e-05,
261
+ "loss": 0.2989,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
+ "grad_norm": 0.39272341132164,
267
+ "learning_rate": 3.514093334190397e-05,
268
+ "loss": 0.2818,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
+ "grad_norm": 0.3980565369129181,
274
+ "learning_rate": 3.473998842964582e-05,
275
+ "loss": 0.273,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
+ "grad_norm": 0.3052268922328949,
281
+ "learning_rate": 3.4338240020569516e-05,
282
+ "loss": 0.2677,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
+ "grad_norm": 0.5999760031700134,
288
+ "learning_rate": 3.3937295108311374e-05,
289
+ "loss": 0.2572,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
+ "grad_norm": 0.4283508062362671,
295
+ "learning_rate": 3.3536350196053226e-05,
296
+ "loss": 0.2468,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
+ "grad_norm": 0.4289894700050354,
302
+ "learning_rate": 3.3134601786976924e-05,
303
+ "loss": 0.2414,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
+ "grad_norm": 0.26386120915412903,
309
+ "learning_rate": 3.273285337790062e-05,
310
+ "loss": 0.2422,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
+ "grad_norm": 0.41095244884490967,
316
+ "learning_rate": 3.233110496882433e-05,
317
+ "loss": 0.2282,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
+ "grad_norm": 0.29514652490615845,
323
+ "learning_rate": 3.192935655974803e-05,
324
+ "loss": 0.2252,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
+ "grad_norm": 0.4044126570224762,
330
+ "learning_rate": 3.152760815067172e-05,
331
+ "loss": 0.2211,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
+ "grad_norm": 0.3767038881778717,
337
+ "learning_rate": 3.1125859741595425e-05,
338
+ "loss": 0.2115,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
+ "grad_norm": 0.36812517046928406,
344
+ "learning_rate": 3.0724111332519124e-05,
345
+ "loss": 0.2059,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
+ "grad_norm": 0.3709106147289276,
351
+ "learning_rate": 3.0322362923442826e-05,
352
+ "loss": 0.2035,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
+ "grad_norm": 0.3285115361213684,
358
+ "learning_rate": 2.9920614514366525e-05,
359
+ "loss": 0.1993,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
+ "grad_norm": 0.3229790925979614,
365
+ "learning_rate": 2.9518866105290227e-05,
366
+ "loss": 0.1968,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
+ "grad_norm": 0.37397509813308716,
372
+ "learning_rate": 2.9117117696213926e-05,
373
+ "loss": 0.194,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
+ "grad_norm": 0.33143311738967896,
379
  "learning_rate": 2.871536928713762e-05,
380
+ "loss": 0.1875,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
+ "grad_norm": 0.2748125493526459,
386
  "learning_rate": 2.8313620878061327e-05,
387
+ "loss": 0.1854,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
+ "grad_norm": 0.2606910169124603,
393
  "learning_rate": 2.7911872468985022e-05,
394
+ "loss": 0.1809,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
+ "grad_norm": 0.28182655572891235,
400
  "learning_rate": 2.7510124059908728e-05,
401
+ "loss": 0.1815,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
+ "grad_norm": 0.3056446313858032,
407
+ "learning_rate": 2.7109179147650576e-05,
408
+ "loss": 0.1775,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
+ "grad_norm": 0.2458430379629135,
414
  "learning_rate": 2.6707430738574275e-05,
415
+ "loss": 0.1714,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
+ "grad_norm": 0.2681204080581665,
421
  "learning_rate": 2.6305682329497977e-05,
422
+ "loss": 0.1734,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
+ "grad_norm": 0.38170355558395386,
428
  "learning_rate": 2.5903933920421676e-05,
429
+ "loss": 0.1701,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
+ "grad_norm": 0.43841251730918884,
435
+ "learning_rate": 2.550298900816353e-05,
436
+ "loss": 0.1656,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
+ "grad_norm": 0.4082754850387573,
442
  "learning_rate": 2.510124059908723e-05,
443
+ "loss": 0.1649,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
+ "grad_norm": 0.27510714530944824,
449
  "learning_rate": 2.4699492190010928e-05,
450
+ "loss": 0.1636,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
+ "grad_norm": 0.3550429344177246,
456
  "learning_rate": 2.429774378093463e-05,
457
+ "loss": 0.1615,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
+ "grad_norm": 0.382055401802063,
463
  "learning_rate": 2.389599537185833e-05,
464
+ "loss": 0.1597,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
+ "grad_norm": 0.38698843121528625,
470
+ "learning_rate": 2.349424696278203e-05,
471
+ "loss": 0.155,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
+ "grad_norm": 0.380403995513916,
477
+ "learning_rate": 2.309249855370573e-05,
478
+ "loss": 0.1594,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
+ "grad_norm": 0.17210371792316437,
484
  "learning_rate": 2.269155364144758e-05,
485
+ "loss": 0.1543,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
+ "grad_norm": 0.33378392457962036,
491
  "learning_rate": 2.228980523237128e-05,
492
+ "loss": 0.1549,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
+ "grad_norm": 0.282175213098526,
498
+ "learning_rate": 2.1888056823294982e-05,
499
+ "loss": 0.1509,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
+ "grad_norm": 0.4829972982406616,
505
+ "learning_rate": 2.148630841421868e-05,
506
+ "loss": 0.1508,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
+ "grad_norm": 0.4101378321647644,
512
+ "learning_rate": 2.1084560005142383e-05,
513
+ "loss": 0.1487,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
+ "grad_norm": 0.24467173218727112,
519
+ "learning_rate": 2.0682811596066082e-05,
520
+ "loss": 0.1482,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
+ "grad_norm": 0.2552469074726105,
526
+ "learning_rate": 2.028106318698978e-05,
527
+ "loss": 0.1474,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
+ "grad_norm": 0.33155035972595215,
533
+ "learning_rate": 1.987931477791348e-05,
534
+ "loss": 0.1427,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
+ "grad_norm": 0.41133707761764526,
540
+ "learning_rate": 1.9478369865655334e-05,
541
+ "loss": 0.143,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
+ "grad_norm": 0.36144211888313293,
547
+ "learning_rate": 1.9076621456579033e-05,
548
+ "loss": 0.1387,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
+ "grad_norm": 0.36597776412963867,
554
+ "learning_rate": 1.8674873047502732e-05,
555
+ "loss": 0.1415,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
+ "grad_norm": 0.37640953063964844,
561
+ "learning_rate": 1.8273124638426434e-05,
562
+ "loss": 0.1408,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
+ "grad_norm": 0.22886815667152405,
568
  "learning_rate": 1.7872983222986438e-05,
569
+ "loss": 0.1366,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
+ "grad_norm": 0.44980695843696594,
575
  "learning_rate": 1.7471234813910137e-05,
576
+ "loss": 0.1411,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
+ "grad_norm": 0.46285852789878845,
582
+ "learning_rate": 1.706948640483384e-05,
583
+ "loss": 0.1367,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
+ "grad_norm": 0.1757335215806961,
589
+ "learning_rate": 1.6667737995757538e-05,
590
+ "loss": 0.1361,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
+ "grad_norm": 0.28056710958480835,
596
+ "learning_rate": 1.6265989586681236e-05,
597
+ "loss": 0.1371,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
+ "grad_norm": 0.4234681725502014,
603
+ "learning_rate": 1.586424117760494e-05,
604
+ "loss": 0.1363,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
+ "grad_norm": 0.2925218641757965,
610
+ "learning_rate": 1.5462492768528637e-05,
611
+ "loss": 0.1336,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
+ "grad_norm": 0.23110254108905792,
617
+ "learning_rate": 1.5060744359452336e-05,
618
+ "loss": 0.1305,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
+ "grad_norm": 0.4187003970146179,
624
+ "learning_rate": 1.4659799447194189e-05,
625
+ "loss": 0.1374,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
+ "grad_norm": 0.30868059396743774,
631
+ "learning_rate": 1.425805103811789e-05,
632
+ "loss": 0.1332,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
+ "grad_norm": 0.24373352527618408,
638
+ "learning_rate": 1.385630262904159e-05,
639
+ "loss": 0.133,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
+ "grad_norm": 0.3976458013057709,
645
+ "learning_rate": 1.345455421996529e-05,
646
+ "loss": 0.1317,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
+ "grad_norm": 0.15130922198295593,
652
+ "learning_rate": 1.3053609307707144e-05,
653
+ "loss": 0.1294,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
+ "grad_norm": 0.26361921429634094,
659
  "learning_rate": 1.2652664395448997e-05,
660
+ "loss": 0.1316,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
+ "grad_norm": 0.3039293587207794,
666
  "learning_rate": 1.2250915986372695e-05,
667
+ "loss": 0.1294,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
+ "grad_norm": 0.23085398972034454,
673
  "learning_rate": 1.1849167577296394e-05,
674
+ "loss": 0.1304,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
+ "grad_norm": 0.45066356658935547,
680
+ "learning_rate": 1.1447419168220095e-05,
681
+ "loss": 0.1283,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
+ "grad_norm": 0.2428194135427475,
687
+ "learning_rate": 1.1045670759143795e-05,
688
+ "loss": 0.1279,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
+ "grad_norm": 0.15587645769119263,
694
+ "learning_rate": 1.0643922350067494e-05,
695
+ "loss": 0.1273,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
+ "grad_norm": 0.5055563449859619,
701
  "learning_rate": 1.0242977437809347e-05,
702
+ "loss": 0.127,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
+ "grad_norm": 0.31220686435699463,
708
+ "learning_rate": 9.841229028733047e-06,
709
+ "loss": 0.1284,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
+ "grad_norm": 0.3776426613330841,
715
+ "learning_rate": 9.439480619656748e-06,
716
+ "loss": 0.1251,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
+ "grad_norm": 0.2834898829460144,
722
+ "learning_rate": 9.037732210580447e-06,
723
+ "loss": 0.1226,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
+ "grad_norm": 0.2295331507921219,
729
+ "learning_rate": 8.635983801504147e-06,
730
+ "loss": 0.1233,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
+ "grad_norm": 0.22921015322208405,
736
+ "learning_rate": 8.234235392427848e-06,
737
+ "loss": 0.1256,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
+ "grad_norm": 0.3294677138328552,
743
+ "learning_rate": 7.832486983351546e-06,
744
+ "loss": 0.1257,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
+ "grad_norm": 0.21186766028404236,
750
+ "learning_rate": 7.430738574275246e-06,
751
+ "loss": 0.1254,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
+ "grad_norm": 0.43346577882766724,
757
+ "learning_rate": 7.029793662017099e-06,
758
+ "loss": 0.1228,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
+ "grad_norm": 0.20274986326694489,
764
+ "learning_rate": 6.628045252940798e-06,
765
+ "loss": 0.124,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
+ "grad_norm": 0.2912587523460388,
771
+ "learning_rate": 6.2262968438644984e-06,
772
+ "loss": 0.1236,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
+ "grad_norm": 0.5663316249847412,
778
+ "learning_rate": 5.824548434788198e-06,
779
+ "loss": 0.1236,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
+ "grad_norm": 0.2563399076461792,
785
+ "learning_rate": 5.423603522530051e-06,
786
+ "loss": 0.1241,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
+ "grad_norm": 0.26923516392707825,
792
  "learning_rate": 5.022658610271903e-06,
793
+ "loss": 0.1231,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
+ "grad_norm": 0.15516141057014465,
799
  "learning_rate": 4.620910201195604e-06,
800
+ "loss": 0.1225,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
+ "grad_norm": 0.1603991985321045,
806
  "learning_rate": 4.219161792119303e-06,
807
+ "loss": 0.1236,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
+ "grad_norm": 0.3031301498413086,
813
  "learning_rate": 3.817413383043003e-06,
814
+ "loss": 0.124,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
+ "grad_norm": 0.25160399079322815,
820
+ "learning_rate": 3.4156649739667035e-06,
821
+ "loss": 0.1212,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
+ "grad_norm": 0.23327353596687317,
827
+ "learning_rate": 3.013916564890403e-06,
828
+ "loss": 0.1199,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
+ "grad_norm": 0.23530858755111694,
834
  "learning_rate": 2.6129716526322558e-06,
835
+ "loss": 0.1228,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
+ "grad_norm": 0.20596709847450256,
841
  "learning_rate": 2.211223243555956e-06,
842
+ "loss": 0.1205,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
+ "grad_norm": 0.35043200850486755,
848
+ "learning_rate": 1.8094748344796555e-06,
849
+ "loss": 0.1188,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
+ "grad_norm": 0.21463052928447723,
855
+ "learning_rate": 1.4077264254033555e-06,
856
+ "loss": 0.1225,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
+ "grad_norm": 0.27506574988365173,
862
+ "learning_rate": 1.0059780163270554e-06,
863
+ "loss": 0.1233,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
+ "grad_norm": 0.3260590732097626,
869
+ "learning_rate": 6.042296072507553e-07,
870
+ "loss": 0.1218,
871
  "step": 61500
872
  },
873
  {
874
  "epoch": 1.9926721090184483,
875
+ "grad_norm": 0.2609338164329529,
876
+ "learning_rate": 2.0248119817445525e-07,
877
+ "loss": 0.1212,
878
  "step": 62000
879
  }
880
  ],
 
895
  "attributes": {}
896
  }
897
  },
898
+ "total_flos": 1.3425879637662106e+17,
899
  "train_batch_size": 32,
900
  "trial_name": null,
901
  "trial_params": null
checkpoints/checkpoint-62228/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7fee19ce79c6f45de80a2e273ede68b16d500dae3a2e3da26235d6b4ebc0f92e
3
  size 242041896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05e935bcc5054a0b30234c54409c5479f2bd26857c4d8fb472a2417bfd1badd3
3
  size 242041896
checkpoints/checkpoint-62228/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:86714a27c46ac133d6d9ea7835d73e013ce66bf9fdd762718e18dc2826d7ca1b
3
  size 484163514
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c54372bd61e30b7426b8e85a283f22608688432422ac7fceaf84d249cf8e2b0
3
  size 484163514
checkpoints/checkpoint-62228/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:18a17ab2d678c291632f4e799e0f8e429a6b4beb3bf190d75be1b7df3597fa44
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5d589f36728497480fc3e3ebdf78468879b4caf040cae34b6063cacd6c3c66f
3
  size 14244
checkpoints/checkpoint-62228/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:db0587d2c7f25e21ab1d2cf91cf211b7d15b48dab687d8d10f7483541a03adb4
3
  size 988
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e08af9833b930adf68f3cffd3bcb35990d1548a9956216431a88f88a9f020248
3
  size 988
checkpoints/checkpoint-62228/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6668451fe2db52de44fd5918c452bb44aa29396a4e9e2cd5118e290aececb3f1
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf0e9c969856e979c4c570e31baf3a86446426bf9fd91e807eb94934e82cb44b
3
  size 1064
checkpoints/checkpoint-62228/special_tokens_map.json CHANGED
@@ -101,6 +101,13 @@
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
 
 
 
 
 
 
 
104
  "eos_token": {
105
  "content": "</s>",
106
  "lstrip": false,
 
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
104
+ "bos_token": {
105
+ "content": "<s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
  "eos_token": {
112
  "content": "</s>",
113
  "lstrip": false,
checkpoints/checkpoint-62228/tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-62228/tokenizer_config.json CHANGED
@@ -10,7 +10,7 @@
10
  "special": true
11
  },
12
  "1": {
13
- "content": "</s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
@@ -18,414 +18,414 @@
18
  "special": true
19
  },
20
  "2": {
21
- "content": "<unk>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
- "32000": {
29
- "content": "<extra_id_99>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
- "32001": {
37
- "content": "<extra_id_98>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
- "32002": {
45
- "content": "<extra_id_97>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
- "32003": {
53
- "content": "<extra_id_96>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
- "32004": {
61
- "content": "<extra_id_95>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
- "32005": {
69
- "content": "<extra_id_94>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
- "32006": {
77
- "content": "<extra_id_93>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
- "32007": {
85
- "content": "<extra_id_92>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
- "32008": {
93
- "content": "<extra_id_91>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
- "32009": {
101
- "content": "<extra_id_90>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
- "32010": {
109
- "content": "<extra_id_89>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
- "32011": {
117
- "content": "<extra_id_88>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
- "32012": {
125
- "content": "<extra_id_87>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
- "32013": {
133
- "content": "<extra_id_86>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
- "32014": {
141
- "content": "<extra_id_85>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
- "32015": {
149
- "content": "<extra_id_84>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
- "32016": {
157
- "content": "<extra_id_83>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
- "32017": {
165
- "content": "<extra_id_82>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
- "32018": {
173
- "content": "<extra_id_81>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
- "32019": {
181
- "content": "<extra_id_80>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
- "32020": {
189
- "content": "<extra_id_79>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
- "32021": {
197
- "content": "<extra_id_78>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
- "32022": {
205
- "content": "<extra_id_77>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
- "32023": {
213
- "content": "<extra_id_76>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
- "32024": {
221
- "content": "<extra_id_75>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
- "32025": {
229
- "content": "<extra_id_74>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
- "32026": {
237
- "content": "<extra_id_73>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
- "32027": {
245
- "content": "<extra_id_72>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
- "32028": {
253
- "content": "<extra_id_71>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
- "32029": {
261
- "content": "<extra_id_70>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
- "32030": {
269
- "content": "<extra_id_69>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
- "32031": {
277
- "content": "<extra_id_68>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
- "32032": {
285
- "content": "<extra_id_67>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
- "32033": {
293
- "content": "<extra_id_66>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
- "32034": {
301
- "content": "<extra_id_65>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
- "32035": {
309
- "content": "<extra_id_64>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
- "32036": {
317
- "content": "<extra_id_63>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
- "32037": {
325
- "content": "<extra_id_62>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
- "32038": {
333
- "content": "<extra_id_61>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
- "32039": {
341
- "content": "<extra_id_60>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
- "32040": {
349
- "content": "<extra_id_59>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
- "32041": {
357
- "content": "<extra_id_58>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
- "32042": {
365
- "content": "<extra_id_57>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
- "32043": {
373
- "content": "<extra_id_56>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
- "32044": {
381
- "content": "<extra_id_55>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
- "32045": {
389
- "content": "<extra_id_54>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
- "32046": {
397
- "content": "<extra_id_53>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
- "32047": {
405
- "content": "<extra_id_52>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
- "32048": {
413
- "content": "<extra_id_51>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
- "32049": {
421
- "content": "<extra_id_50>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
- "32050": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
@@ -433,392 +433,400 @@
433
  "single_word": false,
434
  "special": true
435
  },
436
- "32051": {
437
- "content": "<extra_id_48>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
- "32052": {
445
- "content": "<extra_id_47>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
- "32053": {
453
- "content": "<extra_id_46>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
- "32054": {
461
- "content": "<extra_id_45>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
- "32055": {
469
- "content": "<extra_id_44>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
- "32056": {
477
- "content": "<extra_id_43>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
- "32057": {
485
- "content": "<extra_id_42>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
- "32058": {
493
- "content": "<extra_id_41>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
- "32059": {
501
- "content": "<extra_id_40>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
- "32060": {
509
- "content": "<extra_id_39>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
- "32061": {
517
- "content": "<extra_id_38>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
- "32062": {
525
- "content": "<extra_id_37>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
- "32063": {
533
- "content": "<extra_id_36>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
- "32064": {
541
- "content": "<extra_id_35>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
- "32065": {
549
- "content": "<extra_id_34>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
- "32066": {
557
- "content": "<extra_id_33>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
- "32067": {
565
- "content": "<extra_id_32>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
- "32068": {
573
- "content": "<extra_id_31>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
- "32069": {
581
- "content": "<extra_id_30>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
- "32070": {
589
- "content": "<extra_id_29>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
- "32071": {
597
- "content": "<extra_id_28>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
- "32072": {
605
- "content": "<extra_id_27>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
- "32073": {
613
- "content": "<extra_id_26>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
- "32074": {
621
- "content": "<extra_id_25>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
- "32075": {
629
- "content": "<extra_id_24>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
- "32076": {
637
- "content": "<extra_id_23>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
- "32077": {
645
- "content": "<extra_id_22>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
- "32078": {
653
- "content": "<extra_id_21>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
- "32079": {
661
- "content": "<extra_id_20>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
- "32080": {
669
- "content": "<extra_id_19>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
- "32081": {
677
- "content": "<extra_id_18>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
- "32082": {
685
- "content": "<extra_id_17>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
- "32083": {
693
- "content": "<extra_id_16>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
- "32084": {
701
- "content": "<extra_id_15>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
- "32085": {
709
- "content": "<extra_id_14>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
- "32086": {
717
- "content": "<extra_id_13>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
- "32087": {
725
- "content": "<extra_id_12>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
- "32088": {
733
- "content": "<extra_id_11>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
- "32089": {
741
- "content": "<extra_id_10>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
- "32090": {
749
- "content": "<extra_id_9>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
- "32091": {
757
- "content": "<extra_id_8>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
- "32092": {
765
- "content": "<extra_id_7>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
- "32093": {
773
- "content": "<extra_id_6>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
- "32094": {
781
- "content": "<extra_id_5>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
- "32095": {
789
- "content": "<extra_id_4>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
- "32096": {
797
- "content": "<extra_id_3>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
- "32097": {
805
- "content": "<extra_id_2>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
- "32098": {
813
- "content": "<extra_id_1>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
- "32099": {
821
- "content": "<extra_id_0>",
 
 
 
 
 
 
 
 
822
  "lstrip": false,
823
  "normalized": false,
824
  "rstrip": false,
@@ -928,12 +936,13 @@
928
  "<extra_id_98>",
929
  "<extra_id_99>"
930
  ],
931
- "clean_up_tokenization_spaces": true,
 
932
  "eos_token": "</s>",
933
  "extra_ids": 100,
934
  "extra_special_tokens": {},
935
- "model_max_length": 512,
936
  "pad_token": "<pad>",
937
- "tokenizer_class": "T5Tokenizer",
938
  "unk_token": "<unk>"
939
  }
 
10
  "special": true
11
  },
12
  "1": {
13
+ "content": "<s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
 
18
  "special": true
19
  },
20
  "2": {
21
+ "content": "</s>",
22
  "lstrip": false,
23
  "normalized": false,
24
  "rstrip": false,
25
  "single_word": false,
26
  "special": true
27
  },
28
+ "3": {
29
+ "content": "<unk>",
30
  "lstrip": false,
31
  "normalized": false,
32
  "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
+ "8000": {
37
+ "content": "<extra_id_0>",
38
  "lstrip": false,
39
  "normalized": false,
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
+ "8001": {
45
+ "content": "<extra_id_1>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
+ "8002": {
53
+ "content": "<extra_id_2>",
54
  "lstrip": false,
55
  "normalized": false,
56
  "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
+ "8003": {
61
+ "content": "<extra_id_3>",
62
  "lstrip": false,
63
  "normalized": false,
64
  "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
+ "8004": {
69
+ "content": "<extra_id_4>",
70
  "lstrip": false,
71
  "normalized": false,
72
  "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
+ "8005": {
77
+ "content": "<extra_id_5>",
78
  "lstrip": false,
79
  "normalized": false,
80
  "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
+ "8006": {
85
+ "content": "<extra_id_6>",
86
  "lstrip": false,
87
  "normalized": false,
88
  "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
+ "8007": {
93
+ "content": "<extra_id_7>",
94
  "lstrip": false,
95
  "normalized": false,
96
  "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
+ "8008": {
101
+ "content": "<extra_id_8>",
102
  "lstrip": false,
103
  "normalized": false,
104
  "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
+ "8009": {
109
+ "content": "<extra_id_9>",
110
  "lstrip": false,
111
  "normalized": false,
112
  "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
+ "8010": {
117
+ "content": "<extra_id_10>",
118
  "lstrip": false,
119
  "normalized": false,
120
  "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
+ "8011": {
125
+ "content": "<extra_id_11>",
126
  "lstrip": false,
127
  "normalized": false,
128
  "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
+ "8012": {
133
+ "content": "<extra_id_12>",
134
  "lstrip": false,
135
  "normalized": false,
136
  "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
+ "8013": {
141
+ "content": "<extra_id_13>",
142
  "lstrip": false,
143
  "normalized": false,
144
  "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
+ "8014": {
149
+ "content": "<extra_id_14>",
150
  "lstrip": false,
151
  "normalized": false,
152
  "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
+ "8015": {
157
+ "content": "<extra_id_15>",
158
  "lstrip": false,
159
  "normalized": false,
160
  "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
+ "8016": {
165
+ "content": "<extra_id_16>",
166
  "lstrip": false,
167
  "normalized": false,
168
  "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
+ "8017": {
173
+ "content": "<extra_id_17>",
174
  "lstrip": false,
175
  "normalized": false,
176
  "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
+ "8018": {
181
+ "content": "<extra_id_18>",
182
  "lstrip": false,
183
  "normalized": false,
184
  "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
+ "8019": {
189
+ "content": "<extra_id_19>",
190
  "lstrip": false,
191
  "normalized": false,
192
  "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
+ "8020": {
197
+ "content": "<extra_id_20>",
198
  "lstrip": false,
199
  "normalized": false,
200
  "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
+ "8021": {
205
+ "content": "<extra_id_21>",
206
  "lstrip": false,
207
  "normalized": false,
208
  "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
+ "8022": {
213
+ "content": "<extra_id_22>",
214
  "lstrip": false,
215
  "normalized": false,
216
  "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
+ "8023": {
221
+ "content": "<extra_id_23>",
222
  "lstrip": false,
223
  "normalized": false,
224
  "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
+ "8024": {
229
+ "content": "<extra_id_24>",
230
  "lstrip": false,
231
  "normalized": false,
232
  "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
+ "8025": {
237
+ "content": "<extra_id_25>",
238
  "lstrip": false,
239
  "normalized": false,
240
  "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
+ "8026": {
245
+ "content": "<extra_id_26>",
246
  "lstrip": false,
247
  "normalized": false,
248
  "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
+ "8027": {
253
+ "content": "<extra_id_27>",
254
  "lstrip": false,
255
  "normalized": false,
256
  "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
+ "8028": {
261
+ "content": "<extra_id_28>",
262
  "lstrip": false,
263
  "normalized": false,
264
  "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
+ "8029": {
269
+ "content": "<extra_id_29>",
270
  "lstrip": false,
271
  "normalized": false,
272
  "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
+ "8030": {
277
+ "content": "<extra_id_30>",
278
  "lstrip": false,
279
  "normalized": false,
280
  "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
+ "8031": {
285
+ "content": "<extra_id_31>",
286
  "lstrip": false,
287
  "normalized": false,
288
  "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
+ "8032": {
293
+ "content": "<extra_id_32>",
294
  "lstrip": false,
295
  "normalized": false,
296
  "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
+ "8033": {
301
+ "content": "<extra_id_33>",
302
  "lstrip": false,
303
  "normalized": false,
304
  "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
+ "8034": {
309
+ "content": "<extra_id_34>",
310
  "lstrip": false,
311
  "normalized": false,
312
  "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
+ "8035": {
317
+ "content": "<extra_id_35>",
318
  "lstrip": false,
319
  "normalized": false,
320
  "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
+ "8036": {
325
+ "content": "<extra_id_36>",
326
  "lstrip": false,
327
  "normalized": false,
328
  "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
+ "8037": {
333
+ "content": "<extra_id_37>",
334
  "lstrip": false,
335
  "normalized": false,
336
  "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
+ "8038": {
341
+ "content": "<extra_id_38>",
342
  "lstrip": false,
343
  "normalized": false,
344
  "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
+ "8039": {
349
+ "content": "<extra_id_39>",
350
  "lstrip": false,
351
  "normalized": false,
352
  "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
+ "8040": {
357
+ "content": "<extra_id_40>",
358
  "lstrip": false,
359
  "normalized": false,
360
  "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
+ "8041": {
365
+ "content": "<extra_id_41>",
366
  "lstrip": false,
367
  "normalized": false,
368
  "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
+ "8042": {
373
+ "content": "<extra_id_42>",
374
  "lstrip": false,
375
  "normalized": false,
376
  "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
+ "8043": {
381
+ "content": "<extra_id_43>",
382
  "lstrip": false,
383
  "normalized": false,
384
  "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
+ "8044": {
389
+ "content": "<extra_id_44>",
390
  "lstrip": false,
391
  "normalized": false,
392
  "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
+ "8045": {
397
+ "content": "<extra_id_45>",
398
  "lstrip": false,
399
  "normalized": false,
400
  "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
+ "8046": {
405
+ "content": "<extra_id_46>",
406
  "lstrip": false,
407
  "normalized": false,
408
  "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
+ "8047": {
413
+ "content": "<extra_id_47>",
414
  "lstrip": false,
415
  "normalized": false,
416
  "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
+ "8048": {
421
+ "content": "<extra_id_48>",
422
  "lstrip": false,
423
  "normalized": false,
424
  "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
+ "8049": {
429
  "content": "<extra_id_49>",
430
  "lstrip": false,
431
  "normalized": false,
 
433
  "single_word": false,
434
  "special": true
435
  },
436
+ "8050": {
437
+ "content": "<extra_id_50>",
438
  "lstrip": false,
439
  "normalized": false,
440
  "rstrip": false,
441
  "single_word": false,
442
  "special": true
443
  },
444
+ "8051": {
445
+ "content": "<extra_id_51>",
446
  "lstrip": false,
447
  "normalized": false,
448
  "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
+ "8052": {
453
+ "content": "<extra_id_52>",
454
  "lstrip": false,
455
  "normalized": false,
456
  "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
+ "8053": {
461
+ "content": "<extra_id_53>",
462
  "lstrip": false,
463
  "normalized": false,
464
  "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
+ "8054": {
469
+ "content": "<extra_id_54>",
470
  "lstrip": false,
471
  "normalized": false,
472
  "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
+ "8055": {
477
+ "content": "<extra_id_55>",
478
  "lstrip": false,
479
  "normalized": false,
480
  "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
+ "8056": {
485
+ "content": "<extra_id_56>",
486
  "lstrip": false,
487
  "normalized": false,
488
  "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
+ "8057": {
493
+ "content": "<extra_id_57>",
494
  "lstrip": false,
495
  "normalized": false,
496
  "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
+ "8058": {
501
+ "content": "<extra_id_58>",
502
  "lstrip": false,
503
  "normalized": false,
504
  "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
+ "8059": {
509
+ "content": "<extra_id_59>",
510
  "lstrip": false,
511
  "normalized": false,
512
  "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
+ "8060": {
517
+ "content": "<extra_id_60>",
518
  "lstrip": false,
519
  "normalized": false,
520
  "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
+ "8061": {
525
+ "content": "<extra_id_61>",
526
  "lstrip": false,
527
  "normalized": false,
528
  "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
+ "8062": {
533
+ "content": "<extra_id_62>",
534
  "lstrip": false,
535
  "normalized": false,
536
  "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
+ "8063": {
541
+ "content": "<extra_id_63>",
542
  "lstrip": false,
543
  "normalized": false,
544
  "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
+ "8064": {
549
+ "content": "<extra_id_64>",
550
  "lstrip": false,
551
  "normalized": false,
552
  "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
+ "8065": {
557
+ "content": "<extra_id_65>",
558
  "lstrip": false,
559
  "normalized": false,
560
  "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
+ "8066": {
565
+ "content": "<extra_id_66>",
566
  "lstrip": false,
567
  "normalized": false,
568
  "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
+ "8067": {
573
+ "content": "<extra_id_67>",
574
  "lstrip": false,
575
  "normalized": false,
576
  "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
+ "8068": {
581
+ "content": "<extra_id_68>",
582
  "lstrip": false,
583
  "normalized": false,
584
  "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
+ "8069": {
589
+ "content": "<extra_id_69>",
590
  "lstrip": false,
591
  "normalized": false,
592
  "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
+ "8070": {
597
+ "content": "<extra_id_70>",
598
  "lstrip": false,
599
  "normalized": false,
600
  "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
+ "8071": {
605
+ "content": "<extra_id_71>",
606
  "lstrip": false,
607
  "normalized": false,
608
  "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
+ "8072": {
613
+ "content": "<extra_id_72>",
614
  "lstrip": false,
615
  "normalized": false,
616
  "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
+ "8073": {
621
+ "content": "<extra_id_73>",
622
  "lstrip": false,
623
  "normalized": false,
624
  "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
+ "8074": {
629
+ "content": "<extra_id_74>",
630
  "lstrip": false,
631
  "normalized": false,
632
  "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
+ "8075": {
637
+ "content": "<extra_id_75>",
638
  "lstrip": false,
639
  "normalized": false,
640
  "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
+ "8076": {
645
+ "content": "<extra_id_76>",
646
  "lstrip": false,
647
  "normalized": false,
648
  "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
+ "8077": {
653
+ "content": "<extra_id_77>",
654
  "lstrip": false,
655
  "normalized": false,
656
  "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
+ "8078": {
661
+ "content": "<extra_id_78>",
662
  "lstrip": false,
663
  "normalized": false,
664
  "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
+ "8079": {
669
+ "content": "<extra_id_79>",
670
  "lstrip": false,
671
  "normalized": false,
672
  "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
+ "8080": {
677
+ "content": "<extra_id_80>",
678
  "lstrip": false,
679
  "normalized": false,
680
  "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
+ "8081": {
685
+ "content": "<extra_id_81>",
686
  "lstrip": false,
687
  "normalized": false,
688
  "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
+ "8082": {
693
+ "content": "<extra_id_82>",
694
  "lstrip": false,
695
  "normalized": false,
696
  "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
+ "8083": {
701
+ "content": "<extra_id_83>",
702
  "lstrip": false,
703
  "normalized": false,
704
  "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
+ "8084": {
709
+ "content": "<extra_id_84>",
710
  "lstrip": false,
711
  "normalized": false,
712
  "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
+ "8085": {
717
+ "content": "<extra_id_85>",
718
  "lstrip": false,
719
  "normalized": false,
720
  "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
+ "8086": {
725
+ "content": "<extra_id_86>",
726
  "lstrip": false,
727
  "normalized": false,
728
  "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
+ "8087": {
733
+ "content": "<extra_id_87>",
734
  "lstrip": false,
735
  "normalized": false,
736
  "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
+ "8088": {
741
+ "content": "<extra_id_88>",
742
  "lstrip": false,
743
  "normalized": false,
744
  "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
+ "8089": {
749
+ "content": "<extra_id_89>",
750
  "lstrip": false,
751
  "normalized": false,
752
  "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
+ "8090": {
757
+ "content": "<extra_id_90>",
758
  "lstrip": false,
759
  "normalized": false,
760
  "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
+ "8091": {
765
+ "content": "<extra_id_91>",
766
  "lstrip": false,
767
  "normalized": false,
768
  "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
+ "8092": {
773
+ "content": "<extra_id_92>",
774
  "lstrip": false,
775
  "normalized": false,
776
  "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
+ "8093": {
781
+ "content": "<extra_id_93>",
782
  "lstrip": false,
783
  "normalized": false,
784
  "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
+ "8094": {
789
+ "content": "<extra_id_94>",
790
  "lstrip": false,
791
  "normalized": false,
792
  "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
+ "8095": {
797
+ "content": "<extra_id_95>",
798
  "lstrip": false,
799
  "normalized": false,
800
  "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
+ "8096": {
805
+ "content": "<extra_id_96>",
806
  "lstrip": false,
807
  "normalized": false,
808
  "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
+ "8097": {
813
+ "content": "<extra_id_97>",
814
  "lstrip": false,
815
  "normalized": false,
816
  "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
+ "8098": {
821
+ "content": "<extra_id_98>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "8099": {
829
+ "content": "<extra_id_99>",
830
  "lstrip": false,
831
  "normalized": false,
832
  "rstrip": false,
 
936
  "<extra_id_98>",
937
  "<extra_id_99>"
938
  ],
939
+ "bos_token": "<s>",
940
+ "clean_up_tokenization_spaces": false,
941
  "eos_token": "</s>",
942
  "extra_ids": 100,
943
  "extra_special_tokens": {},
944
+ "model_max_length": 1000000000000000019884624838656,
945
  "pad_token": "<pad>",
946
+ "tokenizer_class": "T5TokenizerFast",
947
  "unk_token": "<unk>"
948
  }
checkpoints/checkpoint-62228/trainer_state.json CHANGED
@@ -11,870 +11,870 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
- "grad_norm": 0.3969729542732239,
15
- "learning_rate": 4.960146557819631e-05,
16
- "loss": 2.05,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
- "grad_norm": 0.3822907507419586,
22
- "learning_rate": 4.919971716912001e-05,
23
- "loss": 1.1207,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
- "grad_norm": 0.36019280552864075,
29
- "learning_rate": 4.879796876004371e-05,
30
- "loss": 0.9225,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
- "grad_norm": 0.30364033579826355,
36
- "learning_rate": 4.8396220350967415e-05,
37
- "loss": 0.8244,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
- "grad_norm": 0.45634394884109497,
43
- "learning_rate": 4.799447194189111e-05,
44
- "loss": 0.7506,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
- "grad_norm": 0.3562425374984741,
50
- "learning_rate": 4.759272353281481e-05,
51
- "loss": 0.7012,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
- "grad_norm": 0.33726808428764343,
57
- "learning_rate": 4.719097512373851e-05,
58
- "loss": 0.6706,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
- "grad_norm": 0.30098849534988403,
64
- "learning_rate": 4.678922671466221e-05,
65
- "loss": 0.6308,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
- "grad_norm": 0.29443585872650146,
71
- "learning_rate": 4.6387478305585915e-05,
72
- "loss": 0.6141,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
- "grad_norm": 0.25647810101509094,
78
- "learning_rate": 4.598572989650961e-05,
79
- "loss": 0.5866,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
- "grad_norm": 0.2516370415687561,
85
- "learning_rate": 4.558398148743331e-05,
86
- "loss": 0.5665,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
- "grad_norm": 0.3337278366088867,
92
- "learning_rate": 4.518223307835701e-05,
93
- "loss": 0.5427,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
- "grad_norm": 0.2592964470386505,
99
- "learning_rate": 4.478048466928072e-05,
100
- "loss": 0.5323,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
- "grad_norm": 0.28550606966018677,
106
- "learning_rate": 4.437873626020441e-05,
107
- "loss": 0.5187,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
- "grad_norm": 0.26474013924598694,
113
- "learning_rate": 4.397698785112811e-05,
114
- "loss": 0.5058,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
- "grad_norm": 0.3018198013305664,
120
- "learning_rate": 4.3575239442051814e-05,
121
- "loss": 0.5013,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
- "grad_norm": 0.2628585994243622,
127
- "learning_rate": 4.317349103297551e-05,
128
- "loss": 0.4883,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
- "grad_norm": 0.30172979831695557,
134
- "learning_rate": 4.277174262389921e-05,
135
- "loss": 0.4795,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
- "grad_norm": 0.25293004512786865,
141
- "learning_rate": 4.236999421482291e-05,
142
- "loss": 0.4682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
- "grad_norm": 0.2726214528083801,
148
- "learning_rate": 4.196824580574661e-05,
149
- "loss": 0.4641,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
- "grad_norm": 0.2570224106311798,
155
- "learning_rate": 4.1566497396670314e-05,
156
- "loss": 0.4556,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
- "grad_norm": 0.26380738615989685,
162
- "learning_rate": 4.1164748987594006e-05,
163
- "loss": 0.449,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
- "grad_norm": 0.2555176913738251,
169
- "learning_rate": 4.076300057851771e-05,
170
- "loss": 0.4412,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
- "grad_norm": 0.2122594565153122,
176
- "learning_rate": 4.036125216944141e-05,
177
- "loss": 0.4365,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
- "grad_norm": 0.2333071529865265,
183
- "learning_rate": 3.9959503760365116e-05,
184
- "loss": 0.433,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
- "grad_norm": 0.24873752892017365,
190
- "learning_rate": 3.955775535128881e-05,
191
- "loss": 0.4283,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
- "grad_norm": 0.32416871190071106,
197
- "learning_rate": 3.915600694221251e-05,
198
- "loss": 0.4218,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
- "grad_norm": 0.23515433073043823,
204
- "learning_rate": 3.875425853313621e-05,
205
- "loss": 0.4139,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
- "grad_norm": 0.22002151608467102,
211
- "learning_rate": 3.8353313620878064e-05,
212
- "loss": 0.417,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
- "grad_norm": 0.251897931098938,
218
- "learning_rate": 3.795156521180176e-05,
219
- "loss": 0.4106,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
- "grad_norm": 0.26212435960769653,
225
- "learning_rate": 3.754981680272546e-05,
226
- "loss": 0.4037,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
- "grad_norm": 0.2718159258365631,
232
  "learning_rate": 3.714887189046731e-05,
233
- "loss": 0.402,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
- "grad_norm": 0.23812739551067352,
239
- "learning_rate": 3.674712348139102e-05,
240
- "loss": 0.3953,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
- "grad_norm": 0.21076083183288574,
246
- "learning_rate": 3.634537507231471e-05,
247
- "loss": 0.3938,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
- "grad_norm": 0.25489869713783264,
253
- "learning_rate": 3.5943626663238416e-05,
254
- "loss": 0.3921,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
- "grad_norm": 0.24057357013225555,
260
- "learning_rate": 3.5541878254162115e-05,
261
- "loss": 0.3867,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
- "grad_norm": 0.24298915266990662,
267
- "learning_rate": 3.514012984508582e-05,
268
- "loss": 0.3868,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
- "grad_norm": 0.2183919996023178,
274
- "learning_rate": 3.473838143600951e-05,
275
- "loss": 0.3803,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
- "grad_norm": 0.2278251349925995,
281
- "learning_rate": 3.433663302693321e-05,
282
- "loss": 0.3775,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
- "grad_norm": 0.240201935172081,
288
- "learning_rate": 3.393568811467507e-05,
289
- "loss": 0.3751,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
- "grad_norm": 0.21118561923503876,
295
- "learning_rate": 3.353393970559877e-05,
296
- "loss": 0.3742,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
- "grad_norm": 0.22640825808048248,
302
- "learning_rate": 3.313219129652247e-05,
303
- "loss": 0.3729,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
- "grad_norm": 0.23105542361736298,
309
- "learning_rate": 3.2730442887446166e-05,
310
- "loss": 0.3687,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
- "grad_norm": 0.24791008234024048,
316
- "learning_rate": 3.2329497975188024e-05,
317
- "loss": 0.3658,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
- "grad_norm": 0.2497881054878235,
323
- "learning_rate": 3.1928553062929875e-05,
324
- "loss": 0.3646,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
- "grad_norm": 0.2395261973142624,
330
- "learning_rate": 3.152680465385357e-05,
331
- "loss": 0.3655,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
- "grad_norm": 0.21194589138031006,
337
- "learning_rate": 3.112505624477727e-05,
338
- "loss": 0.3646,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
- "grad_norm": 0.21682508289813995,
344
- "learning_rate": 3.072330783570097e-05,
345
- "loss": 0.3629,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
- "grad_norm": 0.23710566759109497,
351
- "learning_rate": 3.0321559426624674e-05,
352
- "loss": 0.3583,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
- "grad_norm": 0.23857219517230988,
358
- "learning_rate": 2.9919811017548372e-05,
359
- "loss": 0.3561,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
- "grad_norm": 0.241951584815979,
365
- "learning_rate": 2.9518062608472075e-05,
366
- "loss": 0.3537,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
- "grad_norm": 0.275765061378479,
372
- "learning_rate": 2.9116314199395773e-05,
373
- "loss": 0.3493,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
- "grad_norm": 0.24757184088230133,
379
  "learning_rate": 2.871536928713762e-05,
380
- "loss": 0.3486,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
- "grad_norm": 0.21833688020706177,
386
  "learning_rate": 2.8313620878061327e-05,
387
- "loss": 0.3461,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
- "grad_norm": 0.21623168885707855,
393
  "learning_rate": 2.7911872468985022e-05,
394
- "loss": 0.3468,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
- "grad_norm": 0.20861521363258362,
400
  "learning_rate": 2.7510124059908728e-05,
401
- "loss": 0.3481,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
- "grad_norm": 0.20291315019130707,
407
- "learning_rate": 2.7108375650832423e-05,
408
- "loss": 0.3474,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
- "grad_norm": 0.2101660966873169,
414
  "learning_rate": 2.6707430738574275e-05,
415
- "loss": 0.3412,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
- "grad_norm": 0.23224739730358124,
421
  "learning_rate": 2.6305682329497977e-05,
422
- "loss": 0.3422,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
- "grad_norm": 0.22987599670886993,
428
  "learning_rate": 2.5903933920421676e-05,
429
- "loss": 0.3407,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
- "grad_norm": 0.22307533025741577,
435
- "learning_rate": 2.5502185511345378e-05,
436
- "loss": 0.3365,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
- "grad_norm": 0.20577801764011383,
442
  "learning_rate": 2.510124059908723e-05,
443
- "loss": 0.3409,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
- "grad_norm": 0.23968417942523956,
449
  "learning_rate": 2.4699492190010928e-05,
450
- "loss": 0.339,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
- "grad_norm": 0.2166174054145813,
456
  "learning_rate": 2.429774378093463e-05,
457
- "loss": 0.3317,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
- "grad_norm": 0.22259151935577393,
463
  "learning_rate": 2.389599537185833e-05,
464
- "loss": 0.3404,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
- "grad_norm": 0.2585219442844391,
470
- "learning_rate": 2.3495050459600184e-05,
471
- "loss": 0.3322,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
- "grad_norm": 0.23949937522411346,
477
- "learning_rate": 2.3093302050523882e-05,
478
- "loss": 0.3332,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
- "grad_norm": 0.2360944151878357,
484
  "learning_rate": 2.269155364144758e-05,
485
- "loss": 0.3374,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
- "grad_norm": 0.23383018374443054,
491
  "learning_rate": 2.228980523237128e-05,
492
- "loss": 0.3287,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
- "grad_norm": 0.25602060556411743,
498
- "learning_rate": 2.1888860320113135e-05,
499
- "loss": 0.3262,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
- "grad_norm": 0.2233658730983734,
505
- "learning_rate": 2.1487111911036833e-05,
506
- "loss": 0.3294,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
- "grad_norm": 0.23545712232589722,
512
- "learning_rate": 2.1085363501960532e-05,
513
- "loss": 0.3263,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
- "grad_norm": 0.22479598224163055,
519
- "learning_rate": 2.0683615092884234e-05,
520
- "loss": 0.328,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
- "grad_norm": 0.22207121551036835,
526
- "learning_rate": 2.0282670180626086e-05,
527
- "loss": 0.3275,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
- "grad_norm": 0.23822110891342163,
533
- "learning_rate": 1.9880921771549785e-05,
534
- "loss": 0.3273,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
- "grad_norm": 0.23664866387844086,
540
- "learning_rate": 1.9479173362473487e-05,
541
- "loss": 0.318,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
- "grad_norm": 0.18543508648872375,
547
- "learning_rate": 1.9077424953397185e-05,
548
- "loss": 0.3235,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
- "grad_norm": 0.23305822908878326,
554
- "learning_rate": 1.8676480041139037e-05,
555
- "loss": 0.3243,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
- "grad_norm": 0.21699073910713196,
561
- "learning_rate": 1.827473163206274e-05,
562
- "loss": 0.3222,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
- "grad_norm": 0.2757895588874817,
568
  "learning_rate": 1.7872983222986438e-05,
569
- "loss": 0.3248,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
- "grad_norm": 0.19769324362277985,
575
  "learning_rate": 1.7471234813910137e-05,
576
- "loss": 0.3179,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
- "grad_norm": 0.18964402377605438,
582
- "learning_rate": 1.707028990165199e-05,
583
- "loss": 0.3178,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
- "grad_norm": 0.2584107220172882,
589
- "learning_rate": 1.666854149257569e-05,
590
- "loss": 0.318,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
- "grad_norm": 0.25919750332832336,
596
- "learning_rate": 1.626759658031754e-05,
597
- "loss": 0.3205,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
- "grad_norm": 0.24371759593486786,
603
- "learning_rate": 1.5865848171241244e-05,
604
- "loss": 0.3186,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
- "grad_norm": 0.24457883834838867,
610
- "learning_rate": 1.5464099762164942e-05,
611
- "loss": 0.3162,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
- "grad_norm": 0.1918337345123291,
617
- "learning_rate": 1.5062351353088641e-05,
618
- "loss": 0.3169,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
- "grad_norm": 0.2350657880306244,
624
- "learning_rate": 1.4660602944012342e-05,
625
- "loss": 0.3171,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
- "grad_norm": 0.2481279820203781,
631
- "learning_rate": 1.4258854534936042e-05,
632
- "loss": 0.3179,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
- "grad_norm": 0.21132701635360718,
638
- "learning_rate": 1.3857106125859743e-05,
639
- "loss": 0.3125,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
- "grad_norm": 0.20240716636180878,
645
- "learning_rate": 1.3455357716783443e-05,
646
- "loss": 0.3172,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
- "grad_norm": 0.2224823385477066,
652
- "learning_rate": 1.3054412804525296e-05,
653
- "loss": 0.3151,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
- "grad_norm": 0.19261781871318817,
659
  "learning_rate": 1.2652664395448997e-05,
660
- "loss": 0.312,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
- "grad_norm": 0.16068917512893677,
666
  "learning_rate": 1.2250915986372695e-05,
667
- "loss": 0.3145,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
- "grad_norm": 0.18192972242832184,
673
  "learning_rate": 1.1849167577296394e-05,
674
- "loss": 0.3134,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
- "grad_norm": 0.19884943962097168,
680
- "learning_rate": 1.1448222665038247e-05,
681
- "loss": 0.3119,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
- "grad_norm": 0.1883106529712677,
687
- "learning_rate": 1.1046474255961948e-05,
688
- "loss": 0.316,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
- "grad_norm": 0.19331087172031403,
694
- "learning_rate": 1.0644725846885646e-05,
695
- "loss": 0.3135,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
- "grad_norm": 0.20041531324386597,
701
  "learning_rate": 1.0242977437809347e-05,
702
- "loss": 0.3112,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
- "grad_norm": 0.18530187010765076,
708
- "learning_rate": 9.8420325255512e-06,
709
- "loss": 0.3122,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
- "grad_norm": 0.22725620865821838,
715
- "learning_rate": 9.4402841164749e-06,
716
- "loss": 0.3122,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
- "grad_norm": 0.23093479871749878,
722
- "learning_rate": 9.0385357073986e-06,
723
- "loss": 0.3149,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
- "grad_norm": 0.19580845534801483,
729
- "learning_rate": 8.6367872983223e-06,
730
- "loss": 0.3121,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
- "grad_norm": 0.1742846667766571,
736
- "learning_rate": 8.235842386064153e-06,
737
- "loss": 0.3094,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
- "grad_norm": 0.18685191869735718,
743
- "learning_rate": 7.834093976987852e-06,
744
- "loss": 0.309,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
- "grad_norm": 0.21959276497364044,
750
- "learning_rate": 7.432345567911551e-06,
751
- "loss": 0.3118,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
- "grad_norm": 0.1935770958662033,
757
- "learning_rate": 7.030597158835252e-06,
758
- "loss": 0.3106,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
- "grad_norm": 0.19977129995822906,
764
- "learning_rate": 6.629652246577103e-06,
765
- "loss": 0.3101,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
- "grad_norm": 0.2006288766860962,
771
- "learning_rate": 6.2279038375008035e-06,
772
- "loss": 0.3099,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
- "grad_norm": 0.19280743598937988,
778
- "learning_rate": 5.826155428424504e-06,
779
- "loss": 0.308,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
- "grad_norm": 0.22095157206058502,
785
- "learning_rate": 5.424407019348204e-06,
786
- "loss": 0.3069,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
- "grad_norm": 0.2091740071773529,
792
  "learning_rate": 5.022658610271903e-06,
793
- "loss": 0.3062,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
- "grad_norm": 0.24772244691848755,
799
  "learning_rate": 4.620910201195604e-06,
800
- "loss": 0.3093,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
- "grad_norm": 0.1973961740732193,
806
  "learning_rate": 4.219161792119303e-06,
807
- "loss": 0.309,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
- "grad_norm": 0.22767914831638336,
813
  "learning_rate": 3.817413383043003e-06,
814
- "loss": 0.3109,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
- "grad_norm": 0.21461111307144165,
820
- "learning_rate": 3.416468470784856e-06,
821
- "loss": 0.3075,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
- "grad_norm": 0.24607454240322113,
827
- "learning_rate": 3.0147200617085557e-06,
828
- "loss": 0.3058,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
- "grad_norm": 0.19667118787765503,
834
  "learning_rate": 2.6129716526322558e-06,
835
- "loss": 0.3072,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
- "grad_norm": 0.22604137659072876,
841
  "learning_rate": 2.211223243555956e-06,
842
- "loss": 0.3064,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
- "grad_norm": 0.1879967898130417,
848
- "learning_rate": 1.8102783312978082e-06,
849
- "loss": 0.3063,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
- "grad_norm": 0.21271295845508575,
855
- "learning_rate": 1.408529922221508e-06,
856
- "loss": 0.3076,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
- "grad_norm": 0.16714586317539215,
862
- "learning_rate": 1.006781513145208e-06,
863
- "loss": 0.3092,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
- "grad_norm": 0.20666128396987915,
869
- "learning_rate": 6.050331040689079e-07,
870
- "loss": 0.3076,
871
  "step": 61500
872
  },
873
  {
874
  "epoch": 1.9926721090184483,
875
- "grad_norm": 0.18590718507766724,
876
- "learning_rate": 2.0328469499260785e-07,
877
- "loss": 0.3063,
878
  "step": 62000
879
  }
880
  ],
@@ -895,7 +895,7 @@
895
  "attributes": {}
896
  }
897
  },
898
- "total_flos": 1.3475252326839091e+17,
899
  "train_batch_size": 32,
900
  "trial_name": null,
901
  "trial_params": null
 
11
  "log_history": [
12
  {
13
  "epoch": 0.016069936363052,
14
+ "grad_norm": 0.2569522559642792,
15
+ "learning_rate": 4.960307257183262e-05,
16
+ "loss": 2.9119,
17
  "step": 500
18
  },
19
  {
20
  "epoch": 0.032139872726104,
21
+ "grad_norm": 0.26731985807418823,
22
+ "learning_rate": 4.9201324162756315e-05,
23
+ "loss": 2.2886,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.04820980908915601,
28
+ "grad_norm": 0.3099210560321808,
29
+ "learning_rate": 4.8799575753680014e-05,
30
+ "loss": 2.1431,
31
  "step": 1500
32
  },
33
  {
34
  "epoch": 0.064279745452208,
35
+ "grad_norm": 0.28836730122566223,
36
+ "learning_rate": 4.839782734460372e-05,
37
+ "loss": 2.0369,
38
  "step": 2000
39
  },
40
  {
41
  "epoch": 0.08034968181526002,
42
+ "grad_norm": 0.4808545708656311,
43
+ "learning_rate": 4.799607893552742e-05,
44
+ "loss": 1.932,
45
  "step": 2500
46
  },
47
  {
48
  "epoch": 0.09641961817831202,
49
+ "grad_norm": 0.38000208139419556,
50
+ "learning_rate": 4.759433052645112e-05,
51
+ "loss": 1.7766,
52
  "step": 3000
53
  },
54
  {
55
  "epoch": 0.11248955454136401,
56
+ "grad_norm": 0.4310196340084076,
57
+ "learning_rate": 4.7192582117374816e-05,
58
+ "loss": 1.6022,
59
  "step": 3500
60
  },
61
  {
62
  "epoch": 0.128559490904416,
63
+ "grad_norm": 0.40425005555152893,
64
+ "learning_rate": 4.6790833708298515e-05,
65
+ "loss": 1.4576,
66
  "step": 4000
67
  },
68
  {
69
  "epoch": 0.14462942726746802,
70
+ "grad_norm": 0.3811793327331543,
71
+ "learning_rate": 4.638908529922222e-05,
72
+ "loss": 1.3384,
73
  "step": 4500
74
  },
75
  {
76
  "epoch": 0.16069936363052004,
77
+ "grad_norm": 0.38943949341773987,
78
+ "learning_rate": 4.598733689014591e-05,
79
+ "loss": 1.2233,
80
  "step": 5000
81
  },
82
  {
83
  "epoch": 0.17676929999357202,
84
+ "grad_norm": 0.5517480373382568,
85
+ "learning_rate": 4.558558848106962e-05,
86
+ "loss": 1.1342,
87
  "step": 5500
88
  },
89
  {
90
  "epoch": 0.19283923635662403,
91
+ "grad_norm": 0.4235232174396515,
92
+ "learning_rate": 4.518384007199332e-05,
93
+ "loss": 1.0432,
94
  "step": 6000
95
  },
96
  {
97
  "epoch": 0.20890917271967602,
98
+ "grad_norm": 0.4617592692375183,
99
+ "learning_rate": 4.478209166291702e-05,
100
+ "loss": 0.9781,
101
  "step": 6500
102
  },
103
  {
104
  "epoch": 0.22497910908272803,
105
+ "grad_norm": 0.5447149872779846,
106
+ "learning_rate": 4.4380343253840714e-05,
107
+ "loss": 0.927,
108
  "step": 7000
109
  },
110
  {
111
  "epoch": 0.24104904544578004,
112
+ "grad_norm": 0.4740816354751587,
113
+ "learning_rate": 4.397859484476441e-05,
114
+ "loss": 0.8674,
115
  "step": 7500
116
  },
117
  {
118
  "epoch": 0.257118981808832,
119
+ "grad_norm": 0.5207423567771912,
120
+ "learning_rate": 4.357684643568812e-05,
121
+ "loss": 0.8149,
122
  "step": 8000
123
  },
124
  {
125
  "epoch": 0.27318891817188407,
126
+ "grad_norm": 0.47738897800445557,
127
+ "learning_rate": 4.317509802661182e-05,
128
+ "loss": 0.7685,
129
  "step": 8500
130
  },
131
  {
132
  "epoch": 0.28925885453493605,
133
+ "grad_norm": 0.4176841676235199,
134
+ "learning_rate": 4.2773349617535516e-05,
135
+ "loss": 0.7119,
136
  "step": 9000
137
  },
138
  {
139
  "epoch": 0.30532879089798803,
140
+ "grad_norm": 0.381345272064209,
141
+ "learning_rate": 4.2371601208459215e-05,
142
+ "loss": 0.6682,
143
  "step": 9500
144
  },
145
  {
146
  "epoch": 0.3213987272610401,
147
+ "grad_norm": 0.6301918625831604,
148
+ "learning_rate": 4.1969852799382914e-05,
149
+ "loss": 0.6505,
150
  "step": 10000
151
  },
152
  {
153
  "epoch": 0.33746866362409206,
154
+ "grad_norm": 0.4057278335094452,
155
+ "learning_rate": 4.156810439030662e-05,
156
+ "loss": 0.6063,
157
  "step": 10500
158
  },
159
  {
160
  "epoch": 0.35353859998714404,
161
+ "grad_norm": 0.5442121624946594,
162
+ "learning_rate": 4.116635598123031e-05,
163
+ "loss": 0.5735,
164
  "step": 11000
165
  },
166
  {
167
  "epoch": 0.369608536350196,
168
+ "grad_norm": 0.5113051533699036,
169
+ "learning_rate": 4.076460757215402e-05,
170
+ "loss": 0.5432,
171
  "step": 11500
172
  },
173
  {
174
  "epoch": 0.38567847271324807,
175
+ "grad_norm": 0.6383316516876221,
176
+ "learning_rate": 4.0362859163077716e-05,
177
+ "loss": 0.5143,
178
  "step": 12000
179
  },
180
  {
181
  "epoch": 0.40174840907630005,
182
+ "grad_norm": 0.4316321611404419,
183
+ "learning_rate": 3.996111075400142e-05,
184
+ "loss": 0.4867,
185
  "step": 12500
186
  },
187
  {
188
  "epoch": 0.41781834543935203,
189
+ "grad_norm": 0.42703017592430115,
190
+ "learning_rate": 3.955936234492511e-05,
191
+ "loss": 0.4614,
192
  "step": 13000
193
  },
194
  {
195
  "epoch": 0.4338882818024041,
196
+ "grad_norm": 0.4263227880001068,
197
+ "learning_rate": 3.915761393584881e-05,
198
+ "loss": 0.4391,
199
  "step": 13500
200
  },
201
  {
202
  "epoch": 0.44995821816545606,
203
+ "grad_norm": 0.47577473521232605,
204
+ "learning_rate": 3.875586552677252e-05,
205
+ "loss": 0.4241,
206
  "step": 14000
207
  },
208
  {
209
  "epoch": 0.46602815452850804,
210
+ "grad_norm": 0.3419073224067688,
211
+ "learning_rate": 3.8354117117696216e-05,
212
+ "loss": 0.4019,
213
  "step": 14500
214
  },
215
  {
216
  "epoch": 0.4820980908915601,
217
+ "grad_norm": 0.3402538001537323,
218
+ "learning_rate": 3.7952368708619915e-05,
219
+ "loss": 0.3876,
220
  "step": 15000
221
  },
222
  {
223
  "epoch": 0.49816802725461207,
224
+ "grad_norm": 0.7072747349739075,
225
+ "learning_rate": 3.7550620299543614e-05,
226
+ "loss": 0.364,
227
  "step": 15500
228
  },
229
  {
230
  "epoch": 0.514237963617664,
231
+ "grad_norm": 0.31305554509162903,
232
  "learning_rate": 3.714887189046731e-05,
233
+ "loss": 0.3463,
234
  "step": 16000
235
  },
236
  {
237
  "epoch": 0.530307899980716,
238
+ "grad_norm": 0.4203876554965973,
239
+ "learning_rate": 3.674792697820917e-05,
240
+ "loss": 0.3371,
241
  "step": 16500
242
  },
243
  {
244
  "epoch": 0.5463778363437681,
245
+ "grad_norm": 0.49149152636528015,
246
+ "learning_rate": 3.634617856913286e-05,
247
+ "loss": 0.3189,
248
  "step": 17000
249
  },
250
  {
251
  "epoch": 0.5624477727068201,
252
+ "grad_norm": 0.6438118815422058,
253
+ "learning_rate": 3.594443016005657e-05,
254
+ "loss": 0.3074,
255
  "step": 17500
256
  },
257
  {
258
  "epoch": 0.5785177090698721,
259
+ "grad_norm": 0.6619039177894592,
260
+ "learning_rate": 3.554268175098027e-05,
261
+ "loss": 0.2989,
262
  "step": 18000
263
  },
264
  {
265
  "epoch": 0.5945876454329241,
266
+ "grad_norm": 0.39272341132164,
267
+ "learning_rate": 3.514093334190397e-05,
268
+ "loss": 0.2818,
269
  "step": 18500
270
  },
271
  {
272
  "epoch": 0.6106575817959761,
273
+ "grad_norm": 0.3980565369129181,
274
+ "learning_rate": 3.473998842964582e-05,
275
+ "loss": 0.273,
276
  "step": 19000
277
  },
278
  {
279
  "epoch": 0.626727518159028,
280
+ "grad_norm": 0.3052268922328949,
281
+ "learning_rate": 3.4338240020569516e-05,
282
+ "loss": 0.2677,
283
  "step": 19500
284
  },
285
  {
286
  "epoch": 0.6427974545220801,
287
+ "grad_norm": 0.5999760031700134,
288
+ "learning_rate": 3.3937295108311374e-05,
289
+ "loss": 0.2572,
290
  "step": 20000
291
  },
292
  {
293
  "epoch": 0.6588673908851321,
294
+ "grad_norm": 0.4283508062362671,
295
+ "learning_rate": 3.3536350196053226e-05,
296
+ "loss": 0.2468,
297
  "step": 20500
298
  },
299
  {
300
  "epoch": 0.6749373272481841,
301
+ "grad_norm": 0.4289894700050354,
302
+ "learning_rate": 3.3134601786976924e-05,
303
+ "loss": 0.2414,
304
  "step": 21000
305
  },
306
  {
307
  "epoch": 0.6910072636112361,
308
+ "grad_norm": 0.26386120915412903,
309
+ "learning_rate": 3.273285337790062e-05,
310
+ "loss": 0.2422,
311
  "step": 21500
312
  },
313
  {
314
  "epoch": 0.7070771999742881,
315
+ "grad_norm": 0.41095244884490967,
316
+ "learning_rate": 3.233110496882433e-05,
317
+ "loss": 0.2282,
318
  "step": 22000
319
  },
320
  {
321
  "epoch": 0.7231471363373401,
322
+ "grad_norm": 0.29514652490615845,
323
+ "learning_rate": 3.192935655974803e-05,
324
+ "loss": 0.2252,
325
  "step": 22500
326
  },
327
  {
328
  "epoch": 0.739217072700392,
329
+ "grad_norm": 0.4044126570224762,
330
+ "learning_rate": 3.152760815067172e-05,
331
+ "loss": 0.2211,
332
  "step": 23000
333
  },
334
  {
335
  "epoch": 0.7552870090634441,
336
+ "grad_norm": 0.3767038881778717,
337
+ "learning_rate": 3.1125859741595425e-05,
338
+ "loss": 0.2115,
339
  "step": 23500
340
  },
341
  {
342
  "epoch": 0.7713569454264961,
343
+ "grad_norm": 0.36812517046928406,
344
+ "learning_rate": 3.0724111332519124e-05,
345
+ "loss": 0.2059,
346
  "step": 24000
347
  },
348
  {
349
  "epoch": 0.7874268817895481,
350
+ "grad_norm": 0.3709106147289276,
351
+ "learning_rate": 3.0322362923442826e-05,
352
+ "loss": 0.2035,
353
  "step": 24500
354
  },
355
  {
356
  "epoch": 0.8034968181526001,
357
+ "grad_norm": 0.3285115361213684,
358
+ "learning_rate": 2.9920614514366525e-05,
359
+ "loss": 0.1993,
360
  "step": 25000
361
  },
362
  {
363
  "epoch": 0.8195667545156521,
364
+ "grad_norm": 0.3229790925979614,
365
+ "learning_rate": 2.9518866105290227e-05,
366
+ "loss": 0.1968,
367
  "step": 25500
368
  },
369
  {
370
  "epoch": 0.8356366908787041,
371
+ "grad_norm": 0.37397509813308716,
372
+ "learning_rate": 2.9117117696213926e-05,
373
+ "loss": 0.194,
374
  "step": 26000
375
  },
376
  {
377
  "epoch": 0.8517066272417562,
378
+ "grad_norm": 0.33143311738967896,
379
  "learning_rate": 2.871536928713762e-05,
380
+ "loss": 0.1875,
381
  "step": 26500
382
  },
383
  {
384
  "epoch": 0.8677765636048081,
385
+ "grad_norm": 0.2748125493526459,
386
  "learning_rate": 2.8313620878061327e-05,
387
+ "loss": 0.1854,
388
  "step": 27000
389
  },
390
  {
391
  "epoch": 0.8838464999678601,
392
+ "grad_norm": 0.2606910169124603,
393
  "learning_rate": 2.7911872468985022e-05,
394
+ "loss": 0.1809,
395
  "step": 27500
396
  },
397
  {
398
  "epoch": 0.8999164363309121,
399
+ "grad_norm": 0.28182655572891235,
400
  "learning_rate": 2.7510124059908728e-05,
401
+ "loss": 0.1815,
402
  "step": 28000
403
  },
404
  {
405
  "epoch": 0.9159863726939641,
406
+ "grad_norm": 0.3056446313858032,
407
+ "learning_rate": 2.7109179147650576e-05,
408
+ "loss": 0.1775,
409
  "step": 28500
410
  },
411
  {
412
  "epoch": 0.9320563090570161,
413
+ "grad_norm": 0.2458430379629135,
414
  "learning_rate": 2.6707430738574275e-05,
415
+ "loss": 0.1714,
416
  "step": 29000
417
  },
418
  {
419
  "epoch": 0.9481262454200682,
420
+ "grad_norm": 0.2681204080581665,
421
  "learning_rate": 2.6305682329497977e-05,
422
+ "loss": 0.1734,
423
  "step": 29500
424
  },
425
  {
426
  "epoch": 0.9641961817831202,
427
+ "grad_norm": 0.38170355558395386,
428
  "learning_rate": 2.5903933920421676e-05,
429
+ "loss": 0.1701,
430
  "step": 30000
431
  },
432
  {
433
  "epoch": 0.9802661181461721,
434
+ "grad_norm": 0.43841251730918884,
435
+ "learning_rate": 2.550298900816353e-05,
436
+ "loss": 0.1656,
437
  "step": 30500
438
  },
439
  {
440
  "epoch": 0.9963360545092241,
441
+ "grad_norm": 0.4082754850387573,
442
  "learning_rate": 2.510124059908723e-05,
443
+ "loss": 0.1649,
444
  "step": 31000
445
  },
446
  {
447
  "epoch": 1.0124059908722762,
448
+ "grad_norm": 0.27510714530944824,
449
  "learning_rate": 2.4699492190010928e-05,
450
+ "loss": 0.1636,
451
  "step": 31500
452
  },
453
  {
454
  "epoch": 1.028475927235328,
455
+ "grad_norm": 0.3550429344177246,
456
  "learning_rate": 2.429774378093463e-05,
457
+ "loss": 0.1615,
458
  "step": 32000
459
  },
460
  {
461
  "epoch": 1.0445458635983802,
462
+ "grad_norm": 0.382055401802063,
463
  "learning_rate": 2.389599537185833e-05,
464
+ "loss": 0.1597,
465
  "step": 32500
466
  },
467
  {
468
  "epoch": 1.060615799961432,
469
+ "grad_norm": 0.38698843121528625,
470
+ "learning_rate": 2.349424696278203e-05,
471
+ "loss": 0.155,
472
  "step": 33000
473
  },
474
  {
475
  "epoch": 1.0766857363244842,
476
+ "grad_norm": 0.380403995513916,
477
+ "learning_rate": 2.309249855370573e-05,
478
+ "loss": 0.1594,
479
  "step": 33500
480
  },
481
  {
482
  "epoch": 1.0927556726875363,
483
+ "grad_norm": 0.17210371792316437,
484
  "learning_rate": 2.269155364144758e-05,
485
+ "loss": 0.1543,
486
  "step": 34000
487
  },
488
  {
489
  "epoch": 1.1088256090505881,
490
+ "grad_norm": 0.33378392457962036,
491
  "learning_rate": 2.228980523237128e-05,
492
+ "loss": 0.1549,
493
  "step": 34500
494
  },
495
  {
496
  "epoch": 1.1248955454136402,
497
+ "grad_norm": 0.282175213098526,
498
+ "learning_rate": 2.1888056823294982e-05,
499
+ "loss": 0.1509,
500
  "step": 35000
501
  },
502
  {
503
  "epoch": 1.140965481776692,
504
+ "grad_norm": 0.4829972982406616,
505
+ "learning_rate": 2.148630841421868e-05,
506
+ "loss": 0.1508,
507
  "step": 35500
508
  },
509
  {
510
  "epoch": 1.1570354181397442,
511
+ "grad_norm": 0.4101378321647644,
512
+ "learning_rate": 2.1084560005142383e-05,
513
+ "loss": 0.1487,
514
  "step": 36000
515
  },
516
  {
517
  "epoch": 1.173105354502796,
518
+ "grad_norm": 0.24467173218727112,
519
+ "learning_rate": 2.0682811596066082e-05,
520
+ "loss": 0.1482,
521
  "step": 36500
522
  },
523
  {
524
  "epoch": 1.1891752908658482,
525
+ "grad_norm": 0.2552469074726105,
526
+ "learning_rate": 2.028106318698978e-05,
527
+ "loss": 0.1474,
528
  "step": 37000
529
  },
530
  {
531
  "epoch": 1.2052452272289003,
532
+ "grad_norm": 0.33155035972595215,
533
+ "learning_rate": 1.987931477791348e-05,
534
+ "loss": 0.1427,
535
  "step": 37500
536
  },
537
  {
538
  "epoch": 1.2213151635919521,
539
+ "grad_norm": 0.41133707761764526,
540
+ "learning_rate": 1.9478369865655334e-05,
541
+ "loss": 0.143,
542
  "step": 38000
543
  },
544
  {
545
  "epoch": 1.2373850999550042,
546
+ "grad_norm": 0.36144211888313293,
547
+ "learning_rate": 1.9076621456579033e-05,
548
+ "loss": 0.1387,
549
  "step": 38500
550
  },
551
  {
552
  "epoch": 1.253455036318056,
553
+ "grad_norm": 0.36597776412963867,
554
+ "learning_rate": 1.8674873047502732e-05,
555
+ "loss": 0.1415,
556
  "step": 39000
557
  },
558
  {
559
  "epoch": 1.2695249726811082,
560
+ "grad_norm": 0.37640953063964844,
561
+ "learning_rate": 1.8273124638426434e-05,
562
+ "loss": 0.1408,
563
  "step": 39500
564
  },
565
  {
566
  "epoch": 1.28559490904416,
567
+ "grad_norm": 0.22886815667152405,
568
  "learning_rate": 1.7872983222986438e-05,
569
+ "loss": 0.1366,
570
  "step": 40000
571
  },
572
  {
573
  "epoch": 1.3016648454072122,
574
+ "grad_norm": 0.44980695843696594,
575
  "learning_rate": 1.7471234813910137e-05,
576
+ "loss": 0.1411,
577
  "step": 40500
578
  },
579
  {
580
  "epoch": 1.3177347817702643,
581
+ "grad_norm": 0.46285852789878845,
582
+ "learning_rate": 1.706948640483384e-05,
583
+ "loss": 0.1367,
584
  "step": 41000
585
  },
586
  {
587
  "epoch": 1.3338047181333161,
588
+ "grad_norm": 0.1757335215806961,
589
+ "learning_rate": 1.6667737995757538e-05,
590
+ "loss": 0.1361,
591
  "step": 41500
592
  },
593
  {
594
  "epoch": 1.3498746544963682,
595
+ "grad_norm": 0.28056710958480835,
596
+ "learning_rate": 1.6265989586681236e-05,
597
+ "loss": 0.1371,
598
  "step": 42000
599
  },
600
  {
601
  "epoch": 1.3659445908594203,
602
+ "grad_norm": 0.4234681725502014,
603
+ "learning_rate": 1.586424117760494e-05,
604
+ "loss": 0.1363,
605
  "step": 42500
606
  },
607
  {
608
  "epoch": 1.3820145272224722,
609
+ "grad_norm": 0.2925218641757965,
610
+ "learning_rate": 1.5462492768528637e-05,
611
+ "loss": 0.1336,
612
  "step": 43000
613
  },
614
  {
615
  "epoch": 1.398084463585524,
616
+ "grad_norm": 0.23110254108905792,
617
+ "learning_rate": 1.5060744359452336e-05,
618
+ "loss": 0.1305,
619
  "step": 43500
620
  },
621
  {
622
  "epoch": 1.4141543999485762,
623
+ "grad_norm": 0.4187003970146179,
624
+ "learning_rate": 1.4659799447194189e-05,
625
+ "loss": 0.1374,
626
  "step": 44000
627
  },
628
  {
629
  "epoch": 1.4302243363116283,
630
+ "grad_norm": 0.30868059396743774,
631
+ "learning_rate": 1.425805103811789e-05,
632
+ "loss": 0.1332,
633
  "step": 44500
634
  },
635
  {
636
  "epoch": 1.4462942726746801,
637
+ "grad_norm": 0.24373352527618408,
638
+ "learning_rate": 1.385630262904159e-05,
639
+ "loss": 0.133,
640
  "step": 45000
641
  },
642
  {
643
  "epoch": 1.4623642090377322,
644
+ "grad_norm": 0.3976458013057709,
645
+ "learning_rate": 1.345455421996529e-05,
646
+ "loss": 0.1317,
647
  "step": 45500
648
  },
649
  {
650
  "epoch": 1.4784341454007843,
651
+ "grad_norm": 0.15130922198295593,
652
+ "learning_rate": 1.3053609307707144e-05,
653
+ "loss": 0.1294,
654
  "step": 46000
655
  },
656
  {
657
  "epoch": 1.4945040817638362,
658
+ "grad_norm": 0.26361921429634094,
659
  "learning_rate": 1.2652664395448997e-05,
660
+ "loss": 0.1316,
661
  "step": 46500
662
  },
663
  {
664
  "epoch": 1.510574018126888,
665
+ "grad_norm": 0.3039293587207794,
666
  "learning_rate": 1.2250915986372695e-05,
667
+ "loss": 0.1294,
668
  "step": 47000
669
  },
670
  {
671
  "epoch": 1.5266439544899402,
672
+ "grad_norm": 0.23085398972034454,
673
  "learning_rate": 1.1849167577296394e-05,
674
+ "loss": 0.1304,
675
  "step": 47500
676
  },
677
  {
678
  "epoch": 1.5427138908529923,
679
+ "grad_norm": 0.45066356658935547,
680
+ "learning_rate": 1.1447419168220095e-05,
681
+ "loss": 0.1283,
682
  "step": 48000
683
  },
684
  {
685
  "epoch": 1.5587838272160441,
686
+ "grad_norm": 0.2428194135427475,
687
+ "learning_rate": 1.1045670759143795e-05,
688
+ "loss": 0.1279,
689
  "step": 48500
690
  },
691
  {
692
  "epoch": 1.5748537635790962,
693
+ "grad_norm": 0.15587645769119263,
694
+ "learning_rate": 1.0643922350067494e-05,
695
+ "loss": 0.1273,
696
  "step": 49000
697
  },
698
  {
699
  "epoch": 1.5909236999421483,
700
+ "grad_norm": 0.5055563449859619,
701
  "learning_rate": 1.0242977437809347e-05,
702
+ "loss": 0.127,
703
  "step": 49500
704
  },
705
  {
706
  "epoch": 1.6069936363052002,
707
+ "grad_norm": 0.31220686435699463,
708
+ "learning_rate": 9.841229028733047e-06,
709
+ "loss": 0.1284,
710
  "step": 50000
711
  },
712
  {
713
  "epoch": 1.623063572668252,
714
+ "grad_norm": 0.3776426613330841,
715
+ "learning_rate": 9.439480619656748e-06,
716
+ "loss": 0.1251,
717
  "step": 50500
718
  },
719
  {
720
  "epoch": 1.6391335090313044,
721
+ "grad_norm": 0.2834898829460144,
722
+ "learning_rate": 9.037732210580447e-06,
723
+ "loss": 0.1226,
724
  "step": 51000
725
  },
726
  {
727
  "epoch": 1.6552034453943563,
728
+ "grad_norm": 0.2295331507921219,
729
+ "learning_rate": 8.635983801504147e-06,
730
+ "loss": 0.1233,
731
  "step": 51500
732
  },
733
  {
734
  "epoch": 1.6712733817574081,
735
+ "grad_norm": 0.22921015322208405,
736
+ "learning_rate": 8.234235392427848e-06,
737
+ "loss": 0.1256,
738
  "step": 52000
739
  },
740
  {
741
  "epoch": 1.6873433181204602,
742
+ "grad_norm": 0.3294677138328552,
743
+ "learning_rate": 7.832486983351546e-06,
744
+ "loss": 0.1257,
745
  "step": 52500
746
  },
747
  {
748
  "epoch": 1.7034132544835123,
749
+ "grad_norm": 0.21186766028404236,
750
+ "learning_rate": 7.430738574275246e-06,
751
+ "loss": 0.1254,
752
  "step": 53000
753
  },
754
  {
755
  "epoch": 1.7194831908465642,
756
+ "grad_norm": 0.43346577882766724,
757
+ "learning_rate": 7.029793662017099e-06,
758
+ "loss": 0.1228,
759
  "step": 53500
760
  },
761
  {
762
  "epoch": 1.7355531272096163,
763
+ "grad_norm": 0.20274986326694489,
764
+ "learning_rate": 6.628045252940798e-06,
765
+ "loss": 0.124,
766
  "step": 54000
767
  },
768
  {
769
  "epoch": 1.7516230635726684,
770
+ "grad_norm": 0.2912587523460388,
771
+ "learning_rate": 6.2262968438644984e-06,
772
+ "loss": 0.1236,
773
  "step": 54500
774
  },
775
  {
776
  "epoch": 1.7676929999357203,
777
+ "grad_norm": 0.5663316249847412,
778
+ "learning_rate": 5.824548434788198e-06,
779
+ "loss": 0.1236,
780
  "step": 55000
781
  },
782
  {
783
  "epoch": 1.7837629362987721,
784
+ "grad_norm": 0.2563399076461792,
785
+ "learning_rate": 5.423603522530051e-06,
786
+ "loss": 0.1241,
787
  "step": 55500
788
  },
789
  {
790
  "epoch": 1.7998328726618242,
791
+ "grad_norm": 0.26923516392707825,
792
  "learning_rate": 5.022658610271903e-06,
793
+ "loss": 0.1231,
794
  "step": 56000
795
  },
796
  {
797
  "epoch": 1.8159028090248763,
798
+ "grad_norm": 0.15516141057014465,
799
  "learning_rate": 4.620910201195604e-06,
800
+ "loss": 0.1225,
801
  "step": 56500
802
  },
803
  {
804
  "epoch": 1.8319727453879282,
805
+ "grad_norm": 0.1603991985321045,
806
  "learning_rate": 4.219161792119303e-06,
807
+ "loss": 0.1236,
808
  "step": 57000
809
  },
810
  {
811
  "epoch": 1.8480426817509803,
812
+ "grad_norm": 0.3031301498413086,
813
  "learning_rate": 3.817413383043003e-06,
814
+ "loss": 0.124,
815
  "step": 57500
816
  },
817
  {
818
  "epoch": 1.8641126181140324,
819
+ "grad_norm": 0.25160399079322815,
820
+ "learning_rate": 3.4156649739667035e-06,
821
+ "loss": 0.1212,
822
  "step": 58000
823
  },
824
  {
825
  "epoch": 1.8801825544770843,
826
+ "grad_norm": 0.23327353596687317,
827
+ "learning_rate": 3.013916564890403e-06,
828
+ "loss": 0.1199,
829
  "step": 58500
830
  },
831
  {
832
  "epoch": 1.8962524908401361,
833
+ "grad_norm": 0.23530858755111694,
834
  "learning_rate": 2.6129716526322558e-06,
835
+ "loss": 0.1228,
836
  "step": 59000
837
  },
838
  {
839
  "epoch": 1.9123224272031882,
840
+ "grad_norm": 0.20596709847450256,
841
  "learning_rate": 2.211223243555956e-06,
842
+ "loss": 0.1205,
843
  "step": 59500
844
  },
845
  {
846
  "epoch": 1.9283923635662403,
847
+ "grad_norm": 0.35043200850486755,
848
+ "learning_rate": 1.8094748344796555e-06,
849
+ "loss": 0.1188,
850
  "step": 60000
851
  },
852
  {
853
  "epoch": 1.9444622999292922,
854
+ "grad_norm": 0.21463052928447723,
855
+ "learning_rate": 1.4077264254033555e-06,
856
+ "loss": 0.1225,
857
  "step": 60500
858
  },
859
  {
860
  "epoch": 1.9605322362923443,
861
+ "grad_norm": 0.27506574988365173,
862
+ "learning_rate": 1.0059780163270554e-06,
863
+ "loss": 0.1233,
864
  "step": 61000
865
  },
866
  {
867
  "epoch": 1.9766021726553964,
868
+ "grad_norm": 0.3260590732097626,
869
+ "learning_rate": 6.042296072507553e-07,
870
+ "loss": 0.1218,
871
  "step": 61500
872
  },
873
  {
874
  "epoch": 1.9926721090184483,
875
+ "grad_norm": 0.2609338164329529,
876
+ "learning_rate": 2.0248119817445525e-07,
877
+ "loss": 0.1212,
878
  "step": 62000
879
  }
880
  ],
 
895
  "attributes": {}
896
  }
897
  },
898
+ "total_flos": 1.3475225258478797e+17,
899
  "train_batch_size": 32,
900
  "trial_name": null,
901
  "trial_params": null
config.json DELETED
@@ -1,60 +0,0 @@
1
- {
2
- "architectures": [
3
- "T5ForConditionalGeneration"
4
- ],
5
- "classifier_dropout": 0.0,
6
- "d_ff": 2048,
7
- "d_kv": 64,
8
- "d_model": 512,
9
- "decoder_start_token_id": 0,
10
- "dense_act_fn": "relu",
11
- "dropout_rate": 0.1,
12
- "eos_token_id": 1,
13
- "feed_forward_proj": "relu",
14
- "initializer_factor": 1.0,
15
- "is_encoder_decoder": true,
16
- "is_gated_act": false,
17
- "layer_norm_epsilon": 1e-06,
18
- "model_type": "t5",
19
- "n_positions": 512,
20
- "num_decoder_layers": 6,
21
- "num_heads": 8,
22
- "num_layers": 6,
23
- "output_past": true,
24
- "pad_token_id": 0,
25
- "relative_attention_max_distance": 128,
26
- "relative_attention_num_buckets": 32,
27
- "task_specific_params": {
28
- "summarization": {
29
- "early_stopping": true,
30
- "length_penalty": 2.0,
31
- "max_length": 200,
32
- "min_length": 30,
33
- "no_repeat_ngram_size": 3,
34
- "num_beams": 4,
35
- "prefix": "summarize: "
36
- },
37
- "translation_en_to_de": {
38
- "early_stopping": true,
39
- "max_length": 300,
40
- "num_beams": 4,
41
- "prefix": "translate English to German: "
42
- },
43
- "translation_en_to_fr": {
44
- "early_stopping": true,
45
- "max_length": 300,
46
- "num_beams": 4,
47
- "prefix": "translate English to French: "
48
- },
49
- "translation_en_to_ro": {
50
- "early_stopping": true,
51
- "max_length": 300,
52
- "num_beams": 4,
53
- "prefix": "translate English to Romanian: "
54
- }
55
- },
56
- "torch_dtype": "float32",
57
- "transformers_version": "4.51.2",
58
- "use_cache": true,
59
- "vocab_size": 32128
60
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
generation_config.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "_from_model_config": true,
3
- "decoder_start_token_id": 0,
4
- "eos_token_id": 1,
5
- "pad_token_id": 0,
6
- "transformers_version": "4.51.2"
7
- }
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -5,3 +5,4 @@ torch==2.5.1
5
  tqdm==4.67.1
6
  transformers==4.51.2
7
  wikiextractor==3.0.7
 
 
5
  tqdm==4.67.1
6
  transformers==4.51.2
7
  wikiextractor==3.0.7
8
+ sentencepiece==0.2.0
special_tokens_map.json DELETED
@@ -1,125 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<extra_id_0>",
4
- "<extra_id_1>",
5
- "<extra_id_2>",
6
- "<extra_id_3>",
7
- "<extra_id_4>",
8
- "<extra_id_5>",
9
- "<extra_id_6>",
10
- "<extra_id_7>",
11
- "<extra_id_8>",
12
- "<extra_id_9>",
13
- "<extra_id_10>",
14
- "<extra_id_11>",
15
- "<extra_id_12>",
16
- "<extra_id_13>",
17
- "<extra_id_14>",
18
- "<extra_id_15>",
19
- "<extra_id_16>",
20
- "<extra_id_17>",
21
- "<extra_id_18>",
22
- "<extra_id_19>",
23
- "<extra_id_20>",
24
- "<extra_id_21>",
25
- "<extra_id_22>",
26
- "<extra_id_23>",
27
- "<extra_id_24>",
28
- "<extra_id_25>",
29
- "<extra_id_26>",
30
- "<extra_id_27>",
31
- "<extra_id_28>",
32
- "<extra_id_29>",
33
- "<extra_id_30>",
34
- "<extra_id_31>",
35
- "<extra_id_32>",
36
- "<extra_id_33>",
37
- "<extra_id_34>",
38
- "<extra_id_35>",
39
- "<extra_id_36>",
40
- "<extra_id_37>",
41
- "<extra_id_38>",
42
- "<extra_id_39>",
43
- "<extra_id_40>",
44
- "<extra_id_41>",
45
- "<extra_id_42>",
46
- "<extra_id_43>",
47
- "<extra_id_44>",
48
- "<extra_id_45>",
49
- "<extra_id_46>",
50
- "<extra_id_47>",
51
- "<extra_id_48>",
52
- "<extra_id_49>",
53
- "<extra_id_50>",
54
- "<extra_id_51>",
55
- "<extra_id_52>",
56
- "<extra_id_53>",
57
- "<extra_id_54>",
58
- "<extra_id_55>",
59
- "<extra_id_56>",
60
- "<extra_id_57>",
61
- "<extra_id_58>",
62
- "<extra_id_59>",
63
- "<extra_id_60>",
64
- "<extra_id_61>",
65
- "<extra_id_62>",
66
- "<extra_id_63>",
67
- "<extra_id_64>",
68
- "<extra_id_65>",
69
- "<extra_id_66>",
70
- "<extra_id_67>",
71
- "<extra_id_68>",
72
- "<extra_id_69>",
73
- "<extra_id_70>",
74
- "<extra_id_71>",
75
- "<extra_id_72>",
76
- "<extra_id_73>",
77
- "<extra_id_74>",
78
- "<extra_id_75>",
79
- "<extra_id_76>",
80
- "<extra_id_77>",
81
- "<extra_id_78>",
82
- "<extra_id_79>",
83
- "<extra_id_80>",
84
- "<extra_id_81>",
85
- "<extra_id_82>",
86
- "<extra_id_83>",
87
- "<extra_id_84>",
88
- "<extra_id_85>",
89
- "<extra_id_86>",
90
- "<extra_id_87>",
91
- "<extra_id_88>",
92
- "<extra_id_89>",
93
- "<extra_id_90>",
94
- "<extra_id_91>",
95
- "<extra_id_92>",
96
- "<extra_id_93>",
97
- "<extra_id_94>",
98
- "<extra_id_95>",
99
- "<extra_id_96>",
100
- "<extra_id_97>",
101
- "<extra_id_98>",
102
- "<extra_id_99>"
103
- ],
104
- "eos_token": {
105
- "content": "</s>",
106
- "lstrip": false,
107
- "normalized": false,
108
- "rstrip": false,
109
- "single_word": false
110
- },
111
- "pad_token": {
112
- "content": "<pad>",
113
- "lstrip": false,
114
- "normalized": false,
115
- "rstrip": false,
116
- "single_word": false
117
- },
118
- "unk_token": {
119
- "content": "<unk>",
120
- "lstrip": false,
121
- "normalized": false,
122
- "rstrip": false,
123
- "single_word": false
124
- }
125
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/tokeniser/added_tokens.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<extra_id_0>": 40099,
3
+ "<extra_id_10>": 40089,
4
+ "<extra_id_11>": 40088,
5
+ "<extra_id_12>": 40087,
6
+ "<extra_id_13>": 40086,
7
+ "<extra_id_14>": 40085,
8
+ "<extra_id_15>": 40084,
9
+ "<extra_id_16>": 40083,
10
+ "<extra_id_17>": 40082,
11
+ "<extra_id_18>": 40081,
12
+ "<extra_id_19>": 40080,
13
+ "<extra_id_1>": 40098,
14
+ "<extra_id_20>": 40079,
15
+ "<extra_id_21>": 40078,
16
+ "<extra_id_22>": 40077,
17
+ "<extra_id_23>": 40076,
18
+ "<extra_id_24>": 40075,
19
+ "<extra_id_25>": 40074,
20
+ "<extra_id_26>": 40073,
21
+ "<extra_id_27>": 40072,
22
+ "<extra_id_28>": 40071,
23
+ "<extra_id_29>": 40070,
24
+ "<extra_id_2>": 40097,
25
+ "<extra_id_30>": 40069,
26
+ "<extra_id_31>": 40068,
27
+ "<extra_id_32>": 40067,
28
+ "<extra_id_33>": 40066,
29
+ "<extra_id_34>": 40065,
30
+ "<extra_id_35>": 40064,
31
+ "<extra_id_36>": 40063,
32
+ "<extra_id_37>": 40062,
33
+ "<extra_id_38>": 40061,
34
+ "<extra_id_39>": 40060,
35
+ "<extra_id_3>": 40096,
36
+ "<extra_id_40>": 40059,
37
+ "<extra_id_41>": 40058,
38
+ "<extra_id_42>": 40057,
39
+ "<extra_id_43>": 40056,
40
+ "<extra_id_44>": 40055,
41
+ "<extra_id_45>": 40054,
42
+ "<extra_id_46>": 40053,
43
+ "<extra_id_47>": 40052,
44
+ "<extra_id_48>": 40051,
45
+ "<extra_id_49>": 40050,
46
+ "<extra_id_4>": 40095,
47
+ "<extra_id_50>": 40049,
48
+ "<extra_id_51>": 40048,
49
+ "<extra_id_52>": 40047,
50
+ "<extra_id_53>": 40046,
51
+ "<extra_id_54>": 40045,
52
+ "<extra_id_55>": 40044,
53
+ "<extra_id_56>": 40043,
54
+ "<extra_id_57>": 40042,
55
+ "<extra_id_58>": 40041,
56
+ "<extra_id_59>": 40040,
57
+ "<extra_id_5>": 40094,
58
+ "<extra_id_60>": 40039,
59
+ "<extra_id_61>": 40038,
60
+ "<extra_id_62>": 40037,
61
+ "<extra_id_63>": 40036,
62
+ "<extra_id_64>": 40035,
63
+ "<extra_id_65>": 40034,
64
+ "<extra_id_66>": 40033,
65
+ "<extra_id_67>": 40032,
66
+ "<extra_id_68>": 40031,
67
+ "<extra_id_69>": 40030,
68
+ "<extra_id_6>": 40093,
69
+ "<extra_id_70>": 40029,
70
+ "<extra_id_71>": 40028,
71
+ "<extra_id_72>": 40027,
72
+ "<extra_id_73>": 40026,
73
+ "<extra_id_74>": 40025,
74
+ "<extra_id_75>": 40024,
75
+ "<extra_id_76>": 40023,
76
+ "<extra_id_77>": 40022,
77
+ "<extra_id_78>": 40021,
78
+ "<extra_id_79>": 40020,
79
+ "<extra_id_7>": 40092,
80
+ "<extra_id_80>": 40019,
81
+ "<extra_id_81>": 40018,
82
+ "<extra_id_82>": 40017,
83
+ "<extra_id_83>": 40016,
84
+ "<extra_id_84>": 40015,
85
+ "<extra_id_85>": 40014,
86
+ "<extra_id_86>": 40013,
87
+ "<extra_id_87>": 40012,
88
+ "<extra_id_88>": 40011,
89
+ "<extra_id_89>": 40010,
90
+ "<extra_id_8>": 40091,
91
+ "<extra_id_90>": 40009,
92
+ "<extra_id_91>": 40008,
93
+ "<extra_id_92>": 40007,
94
+ "<extra_id_93>": 40006,
95
+ "<extra_id_94>": 40005,
96
+ "<extra_id_95>": 40004,
97
+ "<extra_id_96>": 40003,
98
+ "<extra_id_97>": 40002,
99
+ "<extra_id_98>": 40001,
100
+ "<extra_id_99>": 40000,
101
+ "<extra_id_9>": 40090
102
+ }
model.safetensors → src/tokeniser/dalat5_sp.model RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7fee19ce79c6f45de80a2e273ede68b16d500dae3a2e3da26235d6b4ebc0f92e
3
- size 242041896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3365205d18a2c0699fb0ee86ab06f3042d553acaa219eb11aa77c3c56f638538
3
+ size 1047337
src/tokeniser/dalat5_sp.vocab ADDED
The diff for this file is too large to render. See raw diff
 
src/tokeniser/special_tokens_map.json CHANGED
@@ -101,8 +101,25 @@
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
104
- "bos_token": "<s>",
105
- "eos_token": "</s>",
106
- "pad_token": "<pad>",
107
- "unk_token": "<unk>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  }
 
101
  "<extra_id_98>",
102
  "<extra_id_99>"
103
  ],
104
+ "eos_token": {
105
+ "content": "</s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "pad_token": {
112
+ "content": "<pad>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "unk_token": {
119
+ "content": "<unk>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ }
125
  }
spiece.model → src/tokeniser/spiece.model RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
- size 791656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3365205d18a2c0699fb0ee86ab06f3042d553acaa219eb11aa77c3c56f638538
3
+ size 1047337
src/tokeniser/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
src/tokeniser/tokenizer_config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "add_prefix_space": null,
3
  "added_tokens_decoder": {
4
  "0": {
5
  "content": "<pad>",
@@ -10,14 +10,14 @@
10
  "special": true
11
  },
12
  "1": {
13
- "content": "<s>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
17
  "single_word": false,
18
  "special": true
19
  },
20
- "2": {
21
  "content": "</s>",
22
  "lstrip": false,
23
  "normalized": false,
@@ -25,811 +25,803 @@
25
  "single_word": false,
26
  "special": true
27
  },
28
- "3": {
29
- "content": "<unk>",
30
- "lstrip": false,
31
  "normalized": false,
32
- "rstrip": false,
33
  "single_word": false,
34
  "special": true
35
  },
36
- "8000": {
37
- "content": "<extra_id_0>",
38
- "lstrip": false,
39
  "normalized": false,
40
- "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
  },
44
- "8001": {
45
- "content": "<extra_id_1>",
46
- "lstrip": false,
47
  "normalized": false,
48
- "rstrip": false,
49
  "single_word": false,
50
  "special": true
51
  },
52
- "8002": {
53
- "content": "<extra_id_2>",
54
- "lstrip": false,
55
  "normalized": false,
56
- "rstrip": false,
57
  "single_word": false,
58
  "special": true
59
  },
60
- "8003": {
61
- "content": "<extra_id_3>",
62
- "lstrip": false,
63
  "normalized": false,
64
- "rstrip": false,
65
  "single_word": false,
66
  "special": true
67
  },
68
- "8004": {
69
- "content": "<extra_id_4>",
70
- "lstrip": false,
71
  "normalized": false,
72
- "rstrip": false,
73
  "single_word": false,
74
  "special": true
75
  },
76
- "8005": {
77
- "content": "<extra_id_5>",
78
- "lstrip": false,
79
  "normalized": false,
80
- "rstrip": false,
81
  "single_word": false,
82
  "special": true
83
  },
84
- "8006": {
85
- "content": "<extra_id_6>",
86
- "lstrip": false,
87
  "normalized": false,
88
- "rstrip": false,
89
  "single_word": false,
90
  "special": true
91
  },
92
- "8007": {
93
- "content": "<extra_id_7>",
94
- "lstrip": false,
95
  "normalized": false,
96
- "rstrip": false,
97
  "single_word": false,
98
  "special": true
99
  },
100
- "8008": {
101
- "content": "<extra_id_8>",
102
- "lstrip": false,
103
  "normalized": false,
104
- "rstrip": false,
105
  "single_word": false,
106
  "special": true
107
  },
108
- "8009": {
109
- "content": "<extra_id_9>",
110
- "lstrip": false,
111
  "normalized": false,
112
- "rstrip": false,
113
  "single_word": false,
114
  "special": true
115
  },
116
- "8010": {
117
- "content": "<extra_id_10>",
118
- "lstrip": false,
119
  "normalized": false,
120
- "rstrip": false,
121
  "single_word": false,
122
  "special": true
123
  },
124
- "8011": {
125
- "content": "<extra_id_11>",
126
- "lstrip": false,
127
  "normalized": false,
128
- "rstrip": false,
129
  "single_word": false,
130
  "special": true
131
  },
132
- "8012": {
133
- "content": "<extra_id_12>",
134
- "lstrip": false,
135
  "normalized": false,
136
- "rstrip": false,
137
  "single_word": false,
138
  "special": true
139
  },
140
- "8013": {
141
- "content": "<extra_id_13>",
142
- "lstrip": false,
143
  "normalized": false,
144
- "rstrip": false,
145
  "single_word": false,
146
  "special": true
147
  },
148
- "8014": {
149
- "content": "<extra_id_14>",
150
- "lstrip": false,
151
  "normalized": false,
152
- "rstrip": false,
153
  "single_word": false,
154
  "special": true
155
  },
156
- "8015": {
157
- "content": "<extra_id_15>",
158
- "lstrip": false,
159
  "normalized": false,
160
- "rstrip": false,
161
  "single_word": false,
162
  "special": true
163
  },
164
- "8016": {
165
- "content": "<extra_id_16>",
166
- "lstrip": false,
167
  "normalized": false,
168
- "rstrip": false,
169
  "single_word": false,
170
  "special": true
171
  },
172
- "8017": {
173
- "content": "<extra_id_17>",
174
- "lstrip": false,
175
  "normalized": false,
176
- "rstrip": false,
177
  "single_word": false,
178
  "special": true
179
  },
180
- "8018": {
181
- "content": "<extra_id_18>",
182
- "lstrip": false,
183
  "normalized": false,
184
- "rstrip": false,
185
  "single_word": false,
186
  "special": true
187
  },
188
- "8019": {
189
- "content": "<extra_id_19>",
190
- "lstrip": false,
191
  "normalized": false,
192
- "rstrip": false,
193
  "single_word": false,
194
  "special": true
195
  },
196
- "8020": {
197
- "content": "<extra_id_20>",
198
- "lstrip": false,
199
  "normalized": false,
200
- "rstrip": false,
201
  "single_word": false,
202
  "special": true
203
  },
204
- "8021": {
205
- "content": "<extra_id_21>",
206
- "lstrip": false,
207
  "normalized": false,
208
- "rstrip": false,
209
  "single_word": false,
210
  "special": true
211
  },
212
- "8022": {
213
- "content": "<extra_id_22>",
214
- "lstrip": false,
215
  "normalized": false,
216
- "rstrip": false,
217
  "single_word": false,
218
  "special": true
219
  },
220
- "8023": {
221
- "content": "<extra_id_23>",
222
- "lstrip": false,
223
  "normalized": false,
224
- "rstrip": false,
225
  "single_word": false,
226
  "special": true
227
  },
228
- "8024": {
229
- "content": "<extra_id_24>",
230
- "lstrip": false,
231
  "normalized": false,
232
- "rstrip": false,
233
  "single_word": false,
234
  "special": true
235
  },
236
- "8025": {
237
- "content": "<extra_id_25>",
238
- "lstrip": false,
239
  "normalized": false,
240
- "rstrip": false,
241
  "single_word": false,
242
  "special": true
243
  },
244
- "8026": {
245
- "content": "<extra_id_26>",
246
- "lstrip": false,
247
  "normalized": false,
248
- "rstrip": false,
249
  "single_word": false,
250
  "special": true
251
  },
252
- "8027": {
253
- "content": "<extra_id_27>",
254
- "lstrip": false,
255
  "normalized": false,
256
- "rstrip": false,
257
  "single_word": false,
258
  "special": true
259
  },
260
- "8028": {
261
- "content": "<extra_id_28>",
262
- "lstrip": false,
263
  "normalized": false,
264
- "rstrip": false,
265
  "single_word": false,
266
  "special": true
267
  },
268
- "8029": {
269
- "content": "<extra_id_29>",
270
- "lstrip": false,
271
  "normalized": false,
272
- "rstrip": false,
273
  "single_word": false,
274
  "special": true
275
  },
276
- "8030": {
277
- "content": "<extra_id_30>",
278
- "lstrip": false,
279
  "normalized": false,
280
- "rstrip": false,
281
  "single_word": false,
282
  "special": true
283
  },
284
- "8031": {
285
- "content": "<extra_id_31>",
286
- "lstrip": false,
287
  "normalized": false,
288
- "rstrip": false,
289
  "single_word": false,
290
  "special": true
291
  },
292
- "8032": {
293
- "content": "<extra_id_32>",
294
- "lstrip": false,
295
  "normalized": false,
296
- "rstrip": false,
297
  "single_word": false,
298
  "special": true
299
  },
300
- "8033": {
301
- "content": "<extra_id_33>",
302
- "lstrip": false,
303
  "normalized": false,
304
- "rstrip": false,
305
  "single_word": false,
306
  "special": true
307
  },
308
- "8034": {
309
- "content": "<extra_id_34>",
310
- "lstrip": false,
311
  "normalized": false,
312
- "rstrip": false,
313
  "single_word": false,
314
  "special": true
315
  },
316
- "8035": {
317
- "content": "<extra_id_35>",
318
- "lstrip": false,
319
  "normalized": false,
320
- "rstrip": false,
321
  "single_word": false,
322
  "special": true
323
  },
324
- "8036": {
325
- "content": "<extra_id_36>",
326
- "lstrip": false,
327
  "normalized": false,
328
- "rstrip": false,
329
  "single_word": false,
330
  "special": true
331
  },
332
- "8037": {
333
- "content": "<extra_id_37>",
334
- "lstrip": false,
335
  "normalized": false,
336
- "rstrip": false,
337
  "single_word": false,
338
  "special": true
339
  },
340
- "8038": {
341
- "content": "<extra_id_38>",
342
- "lstrip": false,
343
  "normalized": false,
344
- "rstrip": false,
345
  "single_word": false,
346
  "special": true
347
  },
348
- "8039": {
349
- "content": "<extra_id_39>",
350
- "lstrip": false,
351
  "normalized": false,
352
- "rstrip": false,
353
  "single_word": false,
354
  "special": true
355
  },
356
- "8040": {
357
- "content": "<extra_id_40>",
358
- "lstrip": false,
359
  "normalized": false,
360
- "rstrip": false,
361
  "single_word": false,
362
  "special": true
363
  },
364
- "8041": {
365
- "content": "<extra_id_41>",
366
- "lstrip": false,
367
  "normalized": false,
368
- "rstrip": false,
369
  "single_word": false,
370
  "special": true
371
  },
372
- "8042": {
373
- "content": "<extra_id_42>",
374
- "lstrip": false,
375
  "normalized": false,
376
- "rstrip": false,
377
  "single_word": false,
378
  "special": true
379
  },
380
- "8043": {
381
- "content": "<extra_id_43>",
382
- "lstrip": false,
383
  "normalized": false,
384
- "rstrip": false,
385
  "single_word": false,
386
  "special": true
387
  },
388
- "8044": {
389
- "content": "<extra_id_44>",
390
- "lstrip": false,
391
  "normalized": false,
392
- "rstrip": false,
393
  "single_word": false,
394
  "special": true
395
  },
396
- "8045": {
397
- "content": "<extra_id_45>",
398
- "lstrip": false,
399
  "normalized": false,
400
- "rstrip": false,
401
  "single_word": false,
402
  "special": true
403
  },
404
- "8046": {
405
- "content": "<extra_id_46>",
406
- "lstrip": false,
407
  "normalized": false,
408
- "rstrip": false,
409
  "single_word": false,
410
  "special": true
411
  },
412
- "8047": {
413
- "content": "<extra_id_47>",
414
- "lstrip": false,
415
  "normalized": false,
416
- "rstrip": false,
417
  "single_word": false,
418
  "special": true
419
  },
420
- "8048": {
421
- "content": "<extra_id_48>",
422
- "lstrip": false,
423
  "normalized": false,
424
- "rstrip": false,
425
  "single_word": false,
426
  "special": true
427
  },
428
- "8049": {
429
  "content": "<extra_id_49>",
430
- "lstrip": false,
431
  "normalized": false,
432
- "rstrip": false,
433
  "single_word": false,
434
  "special": true
435
  },
436
- "8050": {
437
- "content": "<extra_id_50>",
438
- "lstrip": false,
439
- "normalized": false,
440
- "rstrip": false,
441
- "single_word": false,
442
- "special": true
443
- },
444
- "8051": {
445
- "content": "<extra_id_51>",
446
- "lstrip": false,
447
  "normalized": false,
448
- "rstrip": false,
449
  "single_word": false,
450
  "special": true
451
  },
452
- "8052": {
453
- "content": "<extra_id_52>",
454
- "lstrip": false,
455
  "normalized": false,
456
- "rstrip": false,
457
  "single_word": false,
458
  "special": true
459
  },
460
- "8053": {
461
- "content": "<extra_id_53>",
462
- "lstrip": false,
463
  "normalized": false,
464
- "rstrip": false,
465
  "single_word": false,
466
  "special": true
467
  },
468
- "8054": {
469
- "content": "<extra_id_54>",
470
- "lstrip": false,
471
  "normalized": false,
472
- "rstrip": false,
473
  "single_word": false,
474
  "special": true
475
  },
476
- "8055": {
477
- "content": "<extra_id_55>",
478
- "lstrip": false,
479
  "normalized": false,
480
- "rstrip": false,
481
  "single_word": false,
482
  "special": true
483
  },
484
- "8056": {
485
- "content": "<extra_id_56>",
486
- "lstrip": false,
487
  "normalized": false,
488
- "rstrip": false,
489
  "single_word": false,
490
  "special": true
491
  },
492
- "8057": {
493
- "content": "<extra_id_57>",
494
- "lstrip": false,
495
  "normalized": false,
496
- "rstrip": false,
497
  "single_word": false,
498
  "special": true
499
  },
500
- "8058": {
501
- "content": "<extra_id_58>",
502
- "lstrip": false,
503
  "normalized": false,
504
- "rstrip": false,
505
  "single_word": false,
506
  "special": true
507
  },
508
- "8059": {
509
- "content": "<extra_id_59>",
510
- "lstrip": false,
511
  "normalized": false,
512
- "rstrip": false,
513
  "single_word": false,
514
  "special": true
515
  },
516
- "8060": {
517
- "content": "<extra_id_60>",
518
- "lstrip": false,
519
  "normalized": false,
520
- "rstrip": false,
521
  "single_word": false,
522
  "special": true
523
  },
524
- "8061": {
525
- "content": "<extra_id_61>",
526
- "lstrip": false,
527
  "normalized": false,
528
- "rstrip": false,
529
  "single_word": false,
530
  "special": true
531
  },
532
- "8062": {
533
- "content": "<extra_id_62>",
534
- "lstrip": false,
535
  "normalized": false,
536
- "rstrip": false,
537
  "single_word": false,
538
  "special": true
539
  },
540
- "8063": {
541
- "content": "<extra_id_63>",
542
- "lstrip": false,
543
  "normalized": false,
544
- "rstrip": false,
545
  "single_word": false,
546
  "special": true
547
  },
548
- "8064": {
549
- "content": "<extra_id_64>",
550
- "lstrip": false,
551
  "normalized": false,
552
- "rstrip": false,
553
  "single_word": false,
554
  "special": true
555
  },
556
- "8065": {
557
- "content": "<extra_id_65>",
558
- "lstrip": false,
559
  "normalized": false,
560
- "rstrip": false,
561
  "single_word": false,
562
  "special": true
563
  },
564
- "8066": {
565
- "content": "<extra_id_66>",
566
- "lstrip": false,
567
  "normalized": false,
568
- "rstrip": false,
569
  "single_word": false,
570
  "special": true
571
  },
572
- "8067": {
573
- "content": "<extra_id_67>",
574
- "lstrip": false,
575
  "normalized": false,
576
- "rstrip": false,
577
  "single_word": false,
578
  "special": true
579
  },
580
- "8068": {
581
- "content": "<extra_id_68>",
582
- "lstrip": false,
583
  "normalized": false,
584
- "rstrip": false,
585
  "single_word": false,
586
  "special": true
587
  },
588
- "8069": {
589
- "content": "<extra_id_69>",
590
- "lstrip": false,
591
  "normalized": false,
592
- "rstrip": false,
593
  "single_word": false,
594
  "special": true
595
  },
596
- "8070": {
597
- "content": "<extra_id_70>",
598
- "lstrip": false,
599
  "normalized": false,
600
- "rstrip": false,
601
  "single_word": false,
602
  "special": true
603
  },
604
- "8071": {
605
- "content": "<extra_id_71>",
606
- "lstrip": false,
607
  "normalized": false,
608
- "rstrip": false,
609
  "single_word": false,
610
  "special": true
611
  },
612
- "8072": {
613
- "content": "<extra_id_72>",
614
- "lstrip": false,
615
  "normalized": false,
616
- "rstrip": false,
617
  "single_word": false,
618
  "special": true
619
  },
620
- "8073": {
621
- "content": "<extra_id_73>",
622
- "lstrip": false,
623
  "normalized": false,
624
- "rstrip": false,
625
  "single_word": false,
626
  "special": true
627
  },
628
- "8074": {
629
- "content": "<extra_id_74>",
630
- "lstrip": false,
631
  "normalized": false,
632
- "rstrip": false,
633
  "single_word": false,
634
  "special": true
635
  },
636
- "8075": {
637
- "content": "<extra_id_75>",
638
- "lstrip": false,
639
  "normalized": false,
640
- "rstrip": false,
641
  "single_word": false,
642
  "special": true
643
  },
644
- "8076": {
645
- "content": "<extra_id_76>",
646
- "lstrip": false,
647
  "normalized": false,
648
- "rstrip": false,
649
  "single_word": false,
650
  "special": true
651
  },
652
- "8077": {
653
- "content": "<extra_id_77>",
654
- "lstrip": false,
655
  "normalized": false,
656
- "rstrip": false,
657
  "single_word": false,
658
  "special": true
659
  },
660
- "8078": {
661
- "content": "<extra_id_78>",
662
- "lstrip": false,
663
  "normalized": false,
664
- "rstrip": false,
665
  "single_word": false,
666
  "special": true
667
  },
668
- "8079": {
669
- "content": "<extra_id_79>",
670
- "lstrip": false,
671
  "normalized": false,
672
- "rstrip": false,
673
  "single_word": false,
674
  "special": true
675
  },
676
- "8080": {
677
- "content": "<extra_id_80>",
678
- "lstrip": false,
679
  "normalized": false,
680
- "rstrip": false,
681
  "single_word": false,
682
  "special": true
683
  },
684
- "8081": {
685
- "content": "<extra_id_81>",
686
- "lstrip": false,
687
  "normalized": false,
688
- "rstrip": false,
689
  "single_word": false,
690
  "special": true
691
  },
692
- "8082": {
693
- "content": "<extra_id_82>",
694
- "lstrip": false,
695
  "normalized": false,
696
- "rstrip": false,
697
  "single_word": false,
698
  "special": true
699
  },
700
- "8083": {
701
- "content": "<extra_id_83>",
702
- "lstrip": false,
703
  "normalized": false,
704
- "rstrip": false,
705
  "single_word": false,
706
  "special": true
707
  },
708
- "8084": {
709
- "content": "<extra_id_84>",
710
- "lstrip": false,
711
  "normalized": false,
712
- "rstrip": false,
713
  "single_word": false,
714
  "special": true
715
  },
716
- "8085": {
717
- "content": "<extra_id_85>",
718
- "lstrip": false,
719
  "normalized": false,
720
- "rstrip": false,
721
  "single_word": false,
722
  "special": true
723
  },
724
- "8086": {
725
- "content": "<extra_id_86>",
726
- "lstrip": false,
727
  "normalized": false,
728
- "rstrip": false,
729
  "single_word": false,
730
  "special": true
731
  },
732
- "8087": {
733
- "content": "<extra_id_87>",
734
- "lstrip": false,
735
  "normalized": false,
736
- "rstrip": false,
737
  "single_word": false,
738
  "special": true
739
  },
740
- "8088": {
741
- "content": "<extra_id_88>",
742
- "lstrip": false,
743
  "normalized": false,
744
- "rstrip": false,
745
  "single_word": false,
746
  "special": true
747
  },
748
- "8089": {
749
- "content": "<extra_id_89>",
750
- "lstrip": false,
751
  "normalized": false,
752
- "rstrip": false,
753
  "single_word": false,
754
  "special": true
755
  },
756
- "8090": {
757
- "content": "<extra_id_90>",
758
- "lstrip": false,
759
  "normalized": false,
760
- "rstrip": false,
761
  "single_word": false,
762
  "special": true
763
  },
764
- "8091": {
765
- "content": "<extra_id_91>",
766
- "lstrip": false,
767
  "normalized": false,
768
- "rstrip": false,
769
  "single_word": false,
770
  "special": true
771
  },
772
- "8092": {
773
- "content": "<extra_id_92>",
774
- "lstrip": false,
775
  "normalized": false,
776
- "rstrip": false,
777
  "single_word": false,
778
  "special": true
779
  },
780
- "8093": {
781
- "content": "<extra_id_93>",
782
- "lstrip": false,
783
  "normalized": false,
784
- "rstrip": false,
785
  "single_word": false,
786
  "special": true
787
  },
788
- "8094": {
789
- "content": "<extra_id_94>",
790
- "lstrip": false,
791
  "normalized": false,
792
- "rstrip": false,
793
  "single_word": false,
794
  "special": true
795
  },
796
- "8095": {
797
- "content": "<extra_id_95>",
798
- "lstrip": false,
799
  "normalized": false,
800
- "rstrip": false,
801
  "single_word": false,
802
  "special": true
803
  },
804
- "8096": {
805
- "content": "<extra_id_96>",
806
- "lstrip": false,
807
  "normalized": false,
808
- "rstrip": false,
809
  "single_word": false,
810
  "special": true
811
  },
812
- "8097": {
813
- "content": "<extra_id_97>",
814
- "lstrip": false,
815
  "normalized": false,
816
- "rstrip": false,
817
  "single_word": false,
818
  "special": true
819
  },
820
- "8098": {
821
- "content": "<extra_id_98>",
822
- "lstrip": false,
823
  "normalized": false,
824
- "rstrip": false,
825
  "single_word": false,
826
  "special": true
827
  },
828
- "8099": {
829
- "content": "<extra_id_99>",
830
- "lstrip": false,
831
  "normalized": false,
832
- "rstrip": false,
833
  "single_word": false,
834
  "special": true
835
  }
@@ -936,13 +928,14 @@
936
  "<extra_id_98>",
937
  "<extra_id_99>"
938
  ],
939
- "bos_token": "<s>",
940
  "clean_up_tokenization_spaces": false,
941
  "eos_token": "</s>",
942
  "extra_ids": 100,
943
  "extra_special_tokens": {},
 
944
  "model_max_length": 1000000000000000019884624838656,
945
  "pad_token": "<pad>",
946
- "tokenizer_class": "T5TokenizerFast",
 
947
  "unk_token": "<unk>"
948
  }
 
1
  {
2
+ "add_prefix_space": true,
3
  "added_tokens_decoder": {
4
  "0": {
5
  "content": "<pad>",
 
10
  "special": true
11
  },
12
  "1": {
13
+ "content": "<unk>",
14
  "lstrip": false,
15
  "normalized": false,
16
  "rstrip": false,
17
  "single_word": false,
18
  "special": true
19
  },
20
+ "3": {
21
  "content": "</s>",
22
  "lstrip": false,
23
  "normalized": false,
 
25
  "single_word": false,
26
  "special": true
27
  },
28
+ "40000": {
29
+ "content": "<extra_id_99>",
30
+ "lstrip": true,
31
  "normalized": false,
32
+ "rstrip": true,
33
  "single_word": false,
34
  "special": true
35
  },
36
+ "40001": {
37
+ "content": "<extra_id_98>",
38
+ "lstrip": true,
39
  "normalized": false,
40
+ "rstrip": true,
41
  "single_word": false,
42
  "special": true
43
  },
44
+ "40002": {
45
+ "content": "<extra_id_97>",
46
+ "lstrip": true,
47
  "normalized": false,
48
+ "rstrip": true,
49
  "single_word": false,
50
  "special": true
51
  },
52
+ "40003": {
53
+ "content": "<extra_id_96>",
54
+ "lstrip": true,
55
  "normalized": false,
56
+ "rstrip": true,
57
  "single_word": false,
58
  "special": true
59
  },
60
+ "40004": {
61
+ "content": "<extra_id_95>",
62
+ "lstrip": true,
63
  "normalized": false,
64
+ "rstrip": true,
65
  "single_word": false,
66
  "special": true
67
  },
68
+ "40005": {
69
+ "content": "<extra_id_94>",
70
+ "lstrip": true,
71
  "normalized": false,
72
+ "rstrip": true,
73
  "single_word": false,
74
  "special": true
75
  },
76
+ "40006": {
77
+ "content": "<extra_id_93>",
78
+ "lstrip": true,
79
  "normalized": false,
80
+ "rstrip": true,
81
  "single_word": false,
82
  "special": true
83
  },
84
+ "40007": {
85
+ "content": "<extra_id_92>",
86
+ "lstrip": true,
87
  "normalized": false,
88
+ "rstrip": true,
89
  "single_word": false,
90
  "special": true
91
  },
92
+ "40008": {
93
+ "content": "<extra_id_91>",
94
+ "lstrip": true,
95
  "normalized": false,
96
+ "rstrip": true,
97
  "single_word": false,
98
  "special": true
99
  },
100
+ "40009": {
101
+ "content": "<extra_id_90>",
102
+ "lstrip": true,
103
  "normalized": false,
104
+ "rstrip": true,
105
  "single_word": false,
106
  "special": true
107
  },
108
+ "40010": {
109
+ "content": "<extra_id_89>",
110
+ "lstrip": true,
111
  "normalized": false,
112
+ "rstrip": true,
113
  "single_word": false,
114
  "special": true
115
  },
116
+ "40011": {
117
+ "content": "<extra_id_88>",
118
+ "lstrip": true,
119
  "normalized": false,
120
+ "rstrip": true,
121
  "single_word": false,
122
  "special": true
123
  },
124
+ "40012": {
125
+ "content": "<extra_id_87>",
126
+ "lstrip": true,
127
  "normalized": false,
128
+ "rstrip": true,
129
  "single_word": false,
130
  "special": true
131
  },
132
+ "40013": {
133
+ "content": "<extra_id_86>",
134
+ "lstrip": true,
135
  "normalized": false,
136
+ "rstrip": true,
137
  "single_word": false,
138
  "special": true
139
  },
140
+ "40014": {
141
+ "content": "<extra_id_85>",
142
+ "lstrip": true,
143
  "normalized": false,
144
+ "rstrip": true,
145
  "single_word": false,
146
  "special": true
147
  },
148
+ "40015": {
149
+ "content": "<extra_id_84>",
150
+ "lstrip": true,
151
  "normalized": false,
152
+ "rstrip": true,
153
  "single_word": false,
154
  "special": true
155
  },
156
+ "40016": {
157
+ "content": "<extra_id_83>",
158
+ "lstrip": true,
159
  "normalized": false,
160
+ "rstrip": true,
161
  "single_word": false,
162
  "special": true
163
  },
164
+ "40017": {
165
+ "content": "<extra_id_82>",
166
+ "lstrip": true,
167
  "normalized": false,
168
+ "rstrip": true,
169
  "single_word": false,
170
  "special": true
171
  },
172
+ "40018": {
173
+ "content": "<extra_id_81>",
174
+ "lstrip": true,
175
  "normalized": false,
176
+ "rstrip": true,
177
  "single_word": false,
178
  "special": true
179
  },
180
+ "40019": {
181
+ "content": "<extra_id_80>",
182
+ "lstrip": true,
183
  "normalized": false,
184
+ "rstrip": true,
185
  "single_word": false,
186
  "special": true
187
  },
188
+ "40020": {
189
+ "content": "<extra_id_79>",
190
+ "lstrip": true,
191
  "normalized": false,
192
+ "rstrip": true,
193
  "single_word": false,
194
  "special": true
195
  },
196
+ "40021": {
197
+ "content": "<extra_id_78>",
198
+ "lstrip": true,
199
  "normalized": false,
200
+ "rstrip": true,
201
  "single_word": false,
202
  "special": true
203
  },
204
+ "40022": {
205
+ "content": "<extra_id_77>",
206
+ "lstrip": true,
207
  "normalized": false,
208
+ "rstrip": true,
209
  "single_word": false,
210
  "special": true
211
  },
212
+ "40023": {
213
+ "content": "<extra_id_76>",
214
+ "lstrip": true,
215
  "normalized": false,
216
+ "rstrip": true,
217
  "single_word": false,
218
  "special": true
219
  },
220
+ "40024": {
221
+ "content": "<extra_id_75>",
222
+ "lstrip": true,
223
  "normalized": false,
224
+ "rstrip": true,
225
  "single_word": false,
226
  "special": true
227
  },
228
+ "40025": {
229
+ "content": "<extra_id_74>",
230
+ "lstrip": true,
231
  "normalized": false,
232
+ "rstrip": true,
233
  "single_word": false,
234
  "special": true
235
  },
236
+ "40026": {
237
+ "content": "<extra_id_73>",
238
+ "lstrip": true,
239
  "normalized": false,
240
+ "rstrip": true,
241
  "single_word": false,
242
  "special": true
243
  },
244
+ "40027": {
245
+ "content": "<extra_id_72>",
246
+ "lstrip": true,
247
  "normalized": false,
248
+ "rstrip": true,
249
  "single_word": false,
250
  "special": true
251
  },
252
+ "40028": {
253
+ "content": "<extra_id_71>",
254
+ "lstrip": true,
255
  "normalized": false,
256
+ "rstrip": true,
257
  "single_word": false,
258
  "special": true
259
  },
260
+ "40029": {
261
+ "content": "<extra_id_70>",
262
+ "lstrip": true,
263
  "normalized": false,
264
+ "rstrip": true,
265
  "single_word": false,
266
  "special": true
267
  },
268
+ "40030": {
269
+ "content": "<extra_id_69>",
270
+ "lstrip": true,
271
  "normalized": false,
272
+ "rstrip": true,
273
  "single_word": false,
274
  "special": true
275
  },
276
+ "40031": {
277
+ "content": "<extra_id_68>",
278
+ "lstrip": true,
279
  "normalized": false,
280
+ "rstrip": true,
281
  "single_word": false,
282
  "special": true
283
  },
284
+ "40032": {
285
+ "content": "<extra_id_67>",
286
+ "lstrip": true,
287
  "normalized": false,
288
+ "rstrip": true,
289
  "single_word": false,
290
  "special": true
291
  },
292
+ "40033": {
293
+ "content": "<extra_id_66>",
294
+ "lstrip": true,
295
  "normalized": false,
296
+ "rstrip": true,
297
  "single_word": false,
298
  "special": true
299
  },
300
+ "40034": {
301
+ "content": "<extra_id_65>",
302
+ "lstrip": true,
303
  "normalized": false,
304
+ "rstrip": true,
305
  "single_word": false,
306
  "special": true
307
  },
308
+ "40035": {
309
+ "content": "<extra_id_64>",
310
+ "lstrip": true,
311
  "normalized": false,
312
+ "rstrip": true,
313
  "single_word": false,
314
  "special": true
315
  },
316
+ "40036": {
317
+ "content": "<extra_id_63>",
318
+ "lstrip": true,
319
  "normalized": false,
320
+ "rstrip": true,
321
  "single_word": false,
322
  "special": true
323
  },
324
+ "40037": {
325
+ "content": "<extra_id_62>",
326
+ "lstrip": true,
327
  "normalized": false,
328
+ "rstrip": true,
329
  "single_word": false,
330
  "special": true
331
  },
332
+ "40038": {
333
+ "content": "<extra_id_61>",
334
+ "lstrip": true,
335
  "normalized": false,
336
+ "rstrip": true,
337
  "single_word": false,
338
  "special": true
339
  },
340
+ "40039": {
341
+ "content": "<extra_id_60>",
342
+ "lstrip": true,
343
  "normalized": false,
344
+ "rstrip": true,
345
  "single_word": false,
346
  "special": true
347
  },
348
+ "40040": {
349
+ "content": "<extra_id_59>",
350
+ "lstrip": true,
351
  "normalized": false,
352
+ "rstrip": true,
353
  "single_word": false,
354
  "special": true
355
  },
356
+ "40041": {
357
+ "content": "<extra_id_58>",
358
+ "lstrip": true,
359
  "normalized": false,
360
+ "rstrip": true,
361
  "single_word": false,
362
  "special": true
363
  },
364
+ "40042": {
365
+ "content": "<extra_id_57>",
366
+ "lstrip": true,
367
  "normalized": false,
368
+ "rstrip": true,
369
  "single_word": false,
370
  "special": true
371
  },
372
+ "40043": {
373
+ "content": "<extra_id_56>",
374
+ "lstrip": true,
375
  "normalized": false,
376
+ "rstrip": true,
377
  "single_word": false,
378
  "special": true
379
  },
380
+ "40044": {
381
+ "content": "<extra_id_55>",
382
+ "lstrip": true,
383
  "normalized": false,
384
+ "rstrip": true,
385
  "single_word": false,
386
  "special": true
387
  },
388
+ "40045": {
389
+ "content": "<extra_id_54>",
390
+ "lstrip": true,
391
  "normalized": false,
392
+ "rstrip": true,
393
  "single_word": false,
394
  "special": true
395
  },
396
+ "40046": {
397
+ "content": "<extra_id_53>",
398
+ "lstrip": true,
399
  "normalized": false,
400
+ "rstrip": true,
401
  "single_word": false,
402
  "special": true
403
  },
404
+ "40047": {
405
+ "content": "<extra_id_52>",
406
+ "lstrip": true,
407
  "normalized": false,
408
+ "rstrip": true,
409
  "single_word": false,
410
  "special": true
411
  },
412
+ "40048": {
413
+ "content": "<extra_id_51>",
414
+ "lstrip": true,
415
  "normalized": false,
416
+ "rstrip": true,
417
  "single_word": false,
418
  "special": true
419
  },
420
+ "40049": {
421
+ "content": "<extra_id_50>",
422
+ "lstrip": true,
423
  "normalized": false,
424
+ "rstrip": true,
425
  "single_word": false,
426
  "special": true
427
  },
428
+ "40050": {
429
  "content": "<extra_id_49>",
430
+ "lstrip": true,
431
  "normalized": false,
432
+ "rstrip": true,
433
  "single_word": false,
434
  "special": true
435
  },
436
+ "40051": {
437
+ "content": "<extra_id_48>",
438
+ "lstrip": true,
 
 
 
 
 
 
 
 
439
  "normalized": false,
440
+ "rstrip": true,
441
  "single_word": false,
442
  "special": true
443
  },
444
+ "40052": {
445
+ "content": "<extra_id_47>",
446
+ "lstrip": true,
447
  "normalized": false,
448
+ "rstrip": true,
449
  "single_word": false,
450
  "special": true
451
  },
452
+ "40053": {
453
+ "content": "<extra_id_46>",
454
+ "lstrip": true,
455
  "normalized": false,
456
+ "rstrip": true,
457
  "single_word": false,
458
  "special": true
459
  },
460
+ "40054": {
461
+ "content": "<extra_id_45>",
462
+ "lstrip": true,
463
  "normalized": false,
464
+ "rstrip": true,
465
  "single_word": false,
466
  "special": true
467
  },
468
+ "40055": {
469
+ "content": "<extra_id_44>",
470
+ "lstrip": true,
471
  "normalized": false,
472
+ "rstrip": true,
473
  "single_word": false,
474
  "special": true
475
  },
476
+ "40056": {
477
+ "content": "<extra_id_43>",
478
+ "lstrip": true,
479
  "normalized": false,
480
+ "rstrip": true,
481
  "single_word": false,
482
  "special": true
483
  },
484
+ "40057": {
485
+ "content": "<extra_id_42>",
486
+ "lstrip": true,
487
  "normalized": false,
488
+ "rstrip": true,
489
  "single_word": false,
490
  "special": true
491
  },
492
+ "40058": {
493
+ "content": "<extra_id_41>",
494
+ "lstrip": true,
495
  "normalized": false,
496
+ "rstrip": true,
497
  "single_word": false,
498
  "special": true
499
  },
500
+ "40059": {
501
+ "content": "<extra_id_40>",
502
+ "lstrip": true,
503
  "normalized": false,
504
+ "rstrip": true,
505
  "single_word": false,
506
  "special": true
507
  },
508
+ "40060": {
509
+ "content": "<extra_id_39>",
510
+ "lstrip": true,
511
  "normalized": false,
512
+ "rstrip": true,
513
  "single_word": false,
514
  "special": true
515
  },
516
+ "40061": {
517
+ "content": "<extra_id_38>",
518
+ "lstrip": true,
519
  "normalized": false,
520
+ "rstrip": true,
521
  "single_word": false,
522
  "special": true
523
  },
524
+ "40062": {
525
+ "content": "<extra_id_37>",
526
+ "lstrip": true,
527
  "normalized": false,
528
+ "rstrip": true,
529
  "single_word": false,
530
  "special": true
531
  },
532
+ "40063": {
533
+ "content": "<extra_id_36>",
534
+ "lstrip": true,
535
  "normalized": false,
536
+ "rstrip": true,
537
  "single_word": false,
538
  "special": true
539
  },
540
+ "40064": {
541
+ "content": "<extra_id_35>",
542
+ "lstrip": true,
543
  "normalized": false,
544
+ "rstrip": true,
545
  "single_word": false,
546
  "special": true
547
  },
548
+ "40065": {
549
+ "content": "<extra_id_34>",
550
+ "lstrip": true,
551
  "normalized": false,
552
+ "rstrip": true,
553
  "single_word": false,
554
  "special": true
555
  },
556
+ "40066": {
557
+ "content": "<extra_id_33>",
558
+ "lstrip": true,
559
  "normalized": false,
560
+ "rstrip": true,
561
  "single_word": false,
562
  "special": true
563
  },
564
+ "40067": {
565
+ "content": "<extra_id_32>",
566
+ "lstrip": true,
567
  "normalized": false,
568
+ "rstrip": true,
569
  "single_word": false,
570
  "special": true
571
  },
572
+ "40068": {
573
+ "content": "<extra_id_31>",
574
+ "lstrip": true,
575
  "normalized": false,
576
+ "rstrip": true,
577
  "single_word": false,
578
  "special": true
579
  },
580
+ "40069": {
581
+ "content": "<extra_id_30>",
582
+ "lstrip": true,
583
  "normalized": false,
584
+ "rstrip": true,
585
  "single_word": false,
586
  "special": true
587
  },
588
+ "40070": {
589
+ "content": "<extra_id_29>",
590
+ "lstrip": true,
591
  "normalized": false,
592
+ "rstrip": true,
593
  "single_word": false,
594
  "special": true
595
  },
596
+ "40071": {
597
+ "content": "<extra_id_28>",
598
+ "lstrip": true,
599
  "normalized": false,
600
+ "rstrip": true,
601
  "single_word": false,
602
  "special": true
603
  },
604
+ "40072": {
605
+ "content": "<extra_id_27>",
606
+ "lstrip": true,
607
  "normalized": false,
608
+ "rstrip": true,
609
  "single_word": false,
610
  "special": true
611
  },
612
+ "40073": {
613
+ "content": "<extra_id_26>",
614
+ "lstrip": true,
615
  "normalized": false,
616
+ "rstrip": true,
617
  "single_word": false,
618
  "special": true
619
  },
620
+ "40074": {
621
+ "content": "<extra_id_25>",
622
+ "lstrip": true,
623
  "normalized": false,
624
+ "rstrip": true,
625
  "single_word": false,
626
  "special": true
627
  },
628
+ "40075": {
629
+ "content": "<extra_id_24>",
630
+ "lstrip": true,
631
  "normalized": false,
632
+ "rstrip": true,
633
  "single_word": false,
634
  "special": true
635
  },
636
+ "40076": {
637
+ "content": "<extra_id_23>",
638
+ "lstrip": true,
639
  "normalized": false,
640
+ "rstrip": true,
641
  "single_word": false,
642
  "special": true
643
  },
644
+ "40077": {
645
+ "content": "<extra_id_22>",
646
+ "lstrip": true,
647
  "normalized": false,
648
+ "rstrip": true,
649
  "single_word": false,
650
  "special": true
651
  },
652
+ "40078": {
653
+ "content": "<extra_id_21>",
654
+ "lstrip": true,
655
  "normalized": false,
656
+ "rstrip": true,
657
  "single_word": false,
658
  "special": true
659
  },
660
+ "40079": {
661
+ "content": "<extra_id_20>",
662
+ "lstrip": true,
663
  "normalized": false,
664
+ "rstrip": true,
665
  "single_word": false,
666
  "special": true
667
  },
668
+ "40080": {
669
+ "content": "<extra_id_19>",
670
+ "lstrip": true,
671
  "normalized": false,
672
+ "rstrip": true,
673
  "single_word": false,
674
  "special": true
675
  },
676
+ "40081": {
677
+ "content": "<extra_id_18>",
678
+ "lstrip": true,
679
  "normalized": false,
680
+ "rstrip": true,
681
  "single_word": false,
682
  "special": true
683
  },
684
+ "40082": {
685
+ "content": "<extra_id_17>",
686
+ "lstrip": true,
687
  "normalized": false,
688
+ "rstrip": true,
689
  "single_word": false,
690
  "special": true
691
  },
692
+ "40083": {
693
+ "content": "<extra_id_16>",
694
+ "lstrip": true,
695
  "normalized": false,
696
+ "rstrip": true,
697
  "single_word": false,
698
  "special": true
699
  },
700
+ "40084": {
701
+ "content": "<extra_id_15>",
702
+ "lstrip": true,
703
  "normalized": false,
704
+ "rstrip": true,
705
  "single_word": false,
706
  "special": true
707
  },
708
+ "40085": {
709
+ "content": "<extra_id_14>",
710
+ "lstrip": true,
711
  "normalized": false,
712
+ "rstrip": true,
713
  "single_word": false,
714
  "special": true
715
  },
716
+ "40086": {
717
+ "content": "<extra_id_13>",
718
+ "lstrip": true,
719
  "normalized": false,
720
+ "rstrip": true,
721
  "single_word": false,
722
  "special": true
723
  },
724
+ "40087": {
725
+ "content": "<extra_id_12>",
726
+ "lstrip": true,
727
  "normalized": false,
728
+ "rstrip": true,
729
  "single_word": false,
730
  "special": true
731
  },
732
+ "40088": {
733
+ "content": "<extra_id_11>",
734
+ "lstrip": true,
735
  "normalized": false,
736
+ "rstrip": true,
737
  "single_word": false,
738
  "special": true
739
  },
740
+ "40089": {
741
+ "content": "<extra_id_10>",
742
+ "lstrip": true,
743
  "normalized": false,
744
+ "rstrip": true,
745
  "single_word": false,
746
  "special": true
747
  },
748
+ "40090": {
749
+ "content": "<extra_id_9>",
750
+ "lstrip": true,
751
  "normalized": false,
752
+ "rstrip": true,
753
  "single_word": false,
754
  "special": true
755
  },
756
+ "40091": {
757
+ "content": "<extra_id_8>",
758
+ "lstrip": true,
759
  "normalized": false,
760
+ "rstrip": true,
761
  "single_word": false,
762
  "special": true
763
  },
764
+ "40092": {
765
+ "content": "<extra_id_7>",
766
+ "lstrip": true,
767
  "normalized": false,
768
+ "rstrip": true,
769
  "single_word": false,
770
  "special": true
771
  },
772
+ "40093": {
773
+ "content": "<extra_id_6>",
774
+ "lstrip": true,
775
  "normalized": false,
776
+ "rstrip": true,
777
  "single_word": false,
778
  "special": true
779
  },
780
+ "40094": {
781
+ "content": "<extra_id_5>",
782
+ "lstrip": true,
783
  "normalized": false,
784
+ "rstrip": true,
785
  "single_word": false,
786
  "special": true
787
  },
788
+ "40095": {
789
+ "content": "<extra_id_4>",
790
+ "lstrip": true,
791
  "normalized": false,
792
+ "rstrip": true,
793
  "single_word": false,
794
  "special": true
795
  },
796
+ "40096": {
797
+ "content": "<extra_id_3>",
798
+ "lstrip": true,
799
  "normalized": false,
800
+ "rstrip": true,
801
  "single_word": false,
802
  "special": true
803
  },
804
+ "40097": {
805
+ "content": "<extra_id_2>",
806
+ "lstrip": true,
807
  "normalized": false,
808
+ "rstrip": true,
809
  "single_word": false,
810
  "special": true
811
  },
812
+ "40098": {
813
+ "content": "<extra_id_1>",
814
+ "lstrip": true,
815
  "normalized": false,
816
+ "rstrip": true,
817
  "single_word": false,
818
  "special": true
819
  },
820
+ "40099": {
821
+ "content": "<extra_id_0>",
822
+ "lstrip": true,
823
  "normalized": false,
824
+ "rstrip": true,
825
  "single_word": false,
826
  "special": true
827
  }
 
928
  "<extra_id_98>",
929
  "<extra_id_99>"
930
  ],
 
931
  "clean_up_tokenization_spaces": false,
932
  "eos_token": "</s>",
933
  "extra_ids": 100,
934
  "extra_special_tokens": {},
935
+ "legacy": true,
936
  "model_max_length": 1000000000000000019884624838656,
937
  "pad_token": "<pad>",
938
+ "sp_model_kwargs": {},
939
+ "tokenizer_class": "T5Tokenizer",
940
  "unk_token": "<unk>"
941
  }
src/train_t5.py CHANGED
@@ -2,6 +2,7 @@ import torch
2
  from datasets import load_dataset
3
  from transformers import (
4
  Trainer,
 
5
  T5TokenizerFast,
6
  TrainingArguments,
7
  DataCollatorForSeq2Seq,
@@ -15,9 +16,23 @@ data_path = "src/data/clean_corpus.jsonl"
15
  tokeniser_path = "src/tokeniser/"
16
  output_dir = "checkpoints/"
17
 
18
- # Load tokeniser and model
19
  tokeniser = T5TokenizerFast.from_pretrained(tokeniser_path)
20
- model = T5ForConditionalGeneration.from_pretrained(base_model)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
 
23
  def tokenise_function(example: dict) -> T5TokenizerFast:
 
2
  from datasets import load_dataset
3
  from transformers import (
4
  Trainer,
5
+ T5Config,
6
  T5TokenizerFast,
7
  TrainingArguments,
8
  DataCollatorForSeq2Seq,
 
16
  tokeniser_path = "src/tokeniser/"
17
  output_dir = "checkpoints/"
18
 
19
+ # Load tokeniser
20
  tokeniser = T5TokenizerFast.from_pretrained(tokeniser_path)
21
+ vocab_size = tokeniser.vocab_size
22
+ pad_token_id = tokeniser.pad_token_id
23
+
24
+ # Use custom vocab size for the model
25
+ config = T5Config(
26
+ vocab_size = vocab_size,
27
+ d_model = 512,
28
+ d_ff = 2048,
29
+ num_layers = 6,
30
+ num_heads = 8,
31
+ pad_token_id = pad_token_id,
32
+ decoder_start_token_id = pad_token_id
33
+ )
34
+
35
+ model = T5ForConditionalGeneration(config)
36
 
37
 
38
  def tokenise_function(example: dict) -> T5TokenizerFast:
src/train_tokeniser.py CHANGED
@@ -1,52 +1,39 @@
1
  import json
2
- from transformers import T5TokenizerFast
3
- from tokenizers.normalizers import Lowercase
4
- from tokenizers.pre_tokenizers import Whitespace
5
- from tokenizers import Tokenizer, models, trainers
6
 
7
 
8
  # Load corpus data
9
  corpus = []
10
 
11
- with open("src/data/clean_corpus.jsonl", "r", encoding = "utf-8") as f_in:
12
- for i, line in enumerate(f_in):
13
- if i >= 100000: # take only 100000 records for the tokeniser (no need to load everything in the corpus)
14
- break
15
-
16
- item = json.loads(line)
17
- src = item["transliteration"]["src"]
18
- tgt = item["transliteration"]["tgt"]
19
-
20
- # Feed both sides into tokeniser training
21
- corpus.append(src)
22
- corpus.append(tgt)
23
-
24
- # Initialise a tokenizer
25
- tokeniser = Tokenizer(models.BPE(unk_token = "<unk>"))
26
-
27
- # Normalisation (NFD and StripAccents are not used due to characters with diacritics, for instance)
28
- tokeniser.normalizer = Lowercase()
29
-
30
- # Basic whitespace pre-tokenisation
31
- tokeniser.pre_tokenizer = Whitespace()
32
-
33
- # Trainer
34
- trainer = trainers.BpeTrainer(
35
- vocab_size = 8000,
36
- special_tokens = ["<pad>", "<s>", "</s>", "<unk>"]
37
  )
38
 
39
- # Train from the corpus
40
- tokeniser.train_from_iterator(corpus, trainer)
41
-
42
- # Wrap it for Hugging Face
43
- hf_tokeniser = T5TokenizerFast(
44
- tokenizer_object = tokeniser,
45
- unk_token = "<unk>",
46
- pad_token = "<pad>",
47
- bos_token = "<s>",
48
- eos_token = "</s>",
49
- )
50
 
51
- # Save the HF-compliant tokeniser
52
- hf_tokeniser.save_pretrained("src/tokeniser/")
 
1
  import json
2
+ import sentencepiece as spm
3
+ from transformers import T5Tokenizer
 
 
4
 
5
 
6
  # Load corpus data
7
  corpus = []
8
 
9
+ with open("src/data/tokeniser_corpus.txt", "w", encoding = "utf-8") as f_out:
10
+ with open("src/data/clean_corpus.jsonl", "r", encoding = "utf-8") as f_in:
11
+ for i, line in enumerate(f_in):
12
+ if i >= 500000: # take only 500000 records for the tokeniser (no need to load everything in the corpus)
13
+ break
14
+
15
+ item = json.loads(line)
16
+ src = item["transliteration"]["src"]
17
+ tgt = item["transliteration"]["tgt"]
18
+
19
+ f_out.write(src + "\n")
20
+ f_out.write(tgt + "\n")
21
+
22
+ # Train the sentence piece model
23
+ spm.SentencePieceTrainer.Train(
24
+ input = "src/data/tokeniser_corpus.txt",
25
+ model_prefix = "src/tokeniser/dalat5_sp",
26
+ vocab_size = 40000,
27
+ model_type = "unigram", # worth testing with "bpe"
28
+ character_coverage = 1.0, # to preserve rare characters like ä, ñ, etc.
29
+ pad_id = 0,
30
+ unk_id = 1,
31
+ bos_id = 2,
32
+ eos_id = 3,
33
+ user_defined_symbols = ["<pad>", "<s>", "</s>"]
 
34
  )
35
 
36
+ # Convert to a HF-compatible format
37
+ tokenizer = T5Tokenizer.from_pretrained("src/tokeniser/dalat5_sp.model")
 
 
 
 
 
 
 
 
 
38
 
39
+ tokenizer.save_pretrained("src/tokeniser/")
 
tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json DELETED
@@ -1,939 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "added_tokens_decoder": {
4
- "0": {
5
- "content": "<pad>",
6
- "lstrip": false,
7
- "normalized": false,
8
- "rstrip": false,
9
- "single_word": false,
10
- "special": true
11
- },
12
- "1": {
13
- "content": "</s>",
14
- "lstrip": false,
15
- "normalized": false,
16
- "rstrip": false,
17
- "single_word": false,
18
- "special": true
19
- },
20
- "2": {
21
- "content": "<unk>",
22
- "lstrip": false,
23
- "normalized": false,
24
- "rstrip": false,
25
- "single_word": false,
26
- "special": true
27
- },
28
- "32000": {
29
- "content": "<extra_id_99>",
30
- "lstrip": false,
31
- "normalized": false,
32
- "rstrip": false,
33
- "single_word": false,
34
- "special": true
35
- },
36
- "32001": {
37
- "content": "<extra_id_98>",
38
- "lstrip": false,
39
- "normalized": false,
40
- "rstrip": false,
41
- "single_word": false,
42
- "special": true
43
- },
44
- "32002": {
45
- "content": "<extra_id_97>",
46
- "lstrip": false,
47
- "normalized": false,
48
- "rstrip": false,
49
- "single_word": false,
50
- "special": true
51
- },
52
- "32003": {
53
- "content": "<extra_id_96>",
54
- "lstrip": false,
55
- "normalized": false,
56
- "rstrip": false,
57
- "single_word": false,
58
- "special": true
59
- },
60
- "32004": {
61
- "content": "<extra_id_95>",
62
- "lstrip": false,
63
- "normalized": false,
64
- "rstrip": false,
65
- "single_word": false,
66
- "special": true
67
- },
68
- "32005": {
69
- "content": "<extra_id_94>",
70
- "lstrip": false,
71
- "normalized": false,
72
- "rstrip": false,
73
- "single_word": false,
74
- "special": true
75
- },
76
- "32006": {
77
- "content": "<extra_id_93>",
78
- "lstrip": false,
79
- "normalized": false,
80
- "rstrip": false,
81
- "single_word": false,
82
- "special": true
83
- },
84
- "32007": {
85
- "content": "<extra_id_92>",
86
- "lstrip": false,
87
- "normalized": false,
88
- "rstrip": false,
89
- "single_word": false,
90
- "special": true
91
- },
92
- "32008": {
93
- "content": "<extra_id_91>",
94
- "lstrip": false,
95
- "normalized": false,
96
- "rstrip": false,
97
- "single_word": false,
98
- "special": true
99
- },
100
- "32009": {
101
- "content": "<extra_id_90>",
102
- "lstrip": false,
103
- "normalized": false,
104
- "rstrip": false,
105
- "single_word": false,
106
- "special": true
107
- },
108
- "32010": {
109
- "content": "<extra_id_89>",
110
- "lstrip": false,
111
- "normalized": false,
112
- "rstrip": false,
113
- "single_word": false,
114
- "special": true
115
- },
116
- "32011": {
117
- "content": "<extra_id_88>",
118
- "lstrip": false,
119
- "normalized": false,
120
- "rstrip": false,
121
- "single_word": false,
122
- "special": true
123
- },
124
- "32012": {
125
- "content": "<extra_id_87>",
126
- "lstrip": false,
127
- "normalized": false,
128
- "rstrip": false,
129
- "single_word": false,
130
- "special": true
131
- },
132
- "32013": {
133
- "content": "<extra_id_86>",
134
- "lstrip": false,
135
- "normalized": false,
136
- "rstrip": false,
137
- "single_word": false,
138
- "special": true
139
- },
140
- "32014": {
141
- "content": "<extra_id_85>",
142
- "lstrip": false,
143
- "normalized": false,
144
- "rstrip": false,
145
- "single_word": false,
146
- "special": true
147
- },
148
- "32015": {
149
- "content": "<extra_id_84>",
150
- "lstrip": false,
151
- "normalized": false,
152
- "rstrip": false,
153
- "single_word": false,
154
- "special": true
155
- },
156
- "32016": {
157
- "content": "<extra_id_83>",
158
- "lstrip": false,
159
- "normalized": false,
160
- "rstrip": false,
161
- "single_word": false,
162
- "special": true
163
- },
164
- "32017": {
165
- "content": "<extra_id_82>",
166
- "lstrip": false,
167
- "normalized": false,
168
- "rstrip": false,
169
- "single_word": false,
170
- "special": true
171
- },
172
- "32018": {
173
- "content": "<extra_id_81>",
174
- "lstrip": false,
175
- "normalized": false,
176
- "rstrip": false,
177
- "single_word": false,
178
- "special": true
179
- },
180
- "32019": {
181
- "content": "<extra_id_80>",
182
- "lstrip": false,
183
- "normalized": false,
184
- "rstrip": false,
185
- "single_word": false,
186
- "special": true
187
- },
188
- "32020": {
189
- "content": "<extra_id_79>",
190
- "lstrip": false,
191
- "normalized": false,
192
- "rstrip": false,
193
- "single_word": false,
194
- "special": true
195
- },
196
- "32021": {
197
- "content": "<extra_id_78>",
198
- "lstrip": false,
199
- "normalized": false,
200
- "rstrip": false,
201
- "single_word": false,
202
- "special": true
203
- },
204
- "32022": {
205
- "content": "<extra_id_77>",
206
- "lstrip": false,
207
- "normalized": false,
208
- "rstrip": false,
209
- "single_word": false,
210
- "special": true
211
- },
212
- "32023": {
213
- "content": "<extra_id_76>",
214
- "lstrip": false,
215
- "normalized": false,
216
- "rstrip": false,
217
- "single_word": false,
218
- "special": true
219
- },
220
- "32024": {
221
- "content": "<extra_id_75>",
222
- "lstrip": false,
223
- "normalized": false,
224
- "rstrip": false,
225
- "single_word": false,
226
- "special": true
227
- },
228
- "32025": {
229
- "content": "<extra_id_74>",
230
- "lstrip": false,
231
- "normalized": false,
232
- "rstrip": false,
233
- "single_word": false,
234
- "special": true
235
- },
236
- "32026": {
237
- "content": "<extra_id_73>",
238
- "lstrip": false,
239
- "normalized": false,
240
- "rstrip": false,
241
- "single_word": false,
242
- "special": true
243
- },
244
- "32027": {
245
- "content": "<extra_id_72>",
246
- "lstrip": false,
247
- "normalized": false,
248
- "rstrip": false,
249
- "single_word": false,
250
- "special": true
251
- },
252
- "32028": {
253
- "content": "<extra_id_71>",
254
- "lstrip": false,
255
- "normalized": false,
256
- "rstrip": false,
257
- "single_word": false,
258
- "special": true
259
- },
260
- "32029": {
261
- "content": "<extra_id_70>",
262
- "lstrip": false,
263
- "normalized": false,
264
- "rstrip": false,
265
- "single_word": false,
266
- "special": true
267
- },
268
- "32030": {
269
- "content": "<extra_id_69>",
270
- "lstrip": false,
271
- "normalized": false,
272
- "rstrip": false,
273
- "single_word": false,
274
- "special": true
275
- },
276
- "32031": {
277
- "content": "<extra_id_68>",
278
- "lstrip": false,
279
- "normalized": false,
280
- "rstrip": false,
281
- "single_word": false,
282
- "special": true
283
- },
284
- "32032": {
285
- "content": "<extra_id_67>",
286
- "lstrip": false,
287
- "normalized": false,
288
- "rstrip": false,
289
- "single_word": false,
290
- "special": true
291
- },
292
- "32033": {
293
- "content": "<extra_id_66>",
294
- "lstrip": false,
295
- "normalized": false,
296
- "rstrip": false,
297
- "single_word": false,
298
- "special": true
299
- },
300
- "32034": {
301
- "content": "<extra_id_65>",
302
- "lstrip": false,
303
- "normalized": false,
304
- "rstrip": false,
305
- "single_word": false,
306
- "special": true
307
- },
308
- "32035": {
309
- "content": "<extra_id_64>",
310
- "lstrip": false,
311
- "normalized": false,
312
- "rstrip": false,
313
- "single_word": false,
314
- "special": true
315
- },
316
- "32036": {
317
- "content": "<extra_id_63>",
318
- "lstrip": false,
319
- "normalized": false,
320
- "rstrip": false,
321
- "single_word": false,
322
- "special": true
323
- },
324
- "32037": {
325
- "content": "<extra_id_62>",
326
- "lstrip": false,
327
- "normalized": false,
328
- "rstrip": false,
329
- "single_word": false,
330
- "special": true
331
- },
332
- "32038": {
333
- "content": "<extra_id_61>",
334
- "lstrip": false,
335
- "normalized": false,
336
- "rstrip": false,
337
- "single_word": false,
338
- "special": true
339
- },
340
- "32039": {
341
- "content": "<extra_id_60>",
342
- "lstrip": false,
343
- "normalized": false,
344
- "rstrip": false,
345
- "single_word": false,
346
- "special": true
347
- },
348
- "32040": {
349
- "content": "<extra_id_59>",
350
- "lstrip": false,
351
- "normalized": false,
352
- "rstrip": false,
353
- "single_word": false,
354
- "special": true
355
- },
356
- "32041": {
357
- "content": "<extra_id_58>",
358
- "lstrip": false,
359
- "normalized": false,
360
- "rstrip": false,
361
- "single_word": false,
362
- "special": true
363
- },
364
- "32042": {
365
- "content": "<extra_id_57>",
366
- "lstrip": false,
367
- "normalized": false,
368
- "rstrip": false,
369
- "single_word": false,
370
- "special": true
371
- },
372
- "32043": {
373
- "content": "<extra_id_56>",
374
- "lstrip": false,
375
- "normalized": false,
376
- "rstrip": false,
377
- "single_word": false,
378
- "special": true
379
- },
380
- "32044": {
381
- "content": "<extra_id_55>",
382
- "lstrip": false,
383
- "normalized": false,
384
- "rstrip": false,
385
- "single_word": false,
386
- "special": true
387
- },
388
- "32045": {
389
- "content": "<extra_id_54>",
390
- "lstrip": false,
391
- "normalized": false,
392
- "rstrip": false,
393
- "single_word": false,
394
- "special": true
395
- },
396
- "32046": {
397
- "content": "<extra_id_53>",
398
- "lstrip": false,
399
- "normalized": false,
400
- "rstrip": false,
401
- "single_word": false,
402
- "special": true
403
- },
404
- "32047": {
405
- "content": "<extra_id_52>",
406
- "lstrip": false,
407
- "normalized": false,
408
- "rstrip": false,
409
- "single_word": false,
410
- "special": true
411
- },
412
- "32048": {
413
- "content": "<extra_id_51>",
414
- "lstrip": false,
415
- "normalized": false,
416
- "rstrip": false,
417
- "single_word": false,
418
- "special": true
419
- },
420
- "32049": {
421
- "content": "<extra_id_50>",
422
- "lstrip": false,
423
- "normalized": false,
424
- "rstrip": false,
425
- "single_word": false,
426
- "special": true
427
- },
428
- "32050": {
429
- "content": "<extra_id_49>",
430
- "lstrip": false,
431
- "normalized": false,
432
- "rstrip": false,
433
- "single_word": false,
434
- "special": true
435
- },
436
- "32051": {
437
- "content": "<extra_id_48>",
438
- "lstrip": false,
439
- "normalized": false,
440
- "rstrip": false,
441
- "single_word": false,
442
- "special": true
443
- },
444
- "32052": {
445
- "content": "<extra_id_47>",
446
- "lstrip": false,
447
- "normalized": false,
448
- "rstrip": false,
449
- "single_word": false,
450
- "special": true
451
- },
452
- "32053": {
453
- "content": "<extra_id_46>",
454
- "lstrip": false,
455
- "normalized": false,
456
- "rstrip": false,
457
- "single_word": false,
458
- "special": true
459
- },
460
- "32054": {
461
- "content": "<extra_id_45>",
462
- "lstrip": false,
463
- "normalized": false,
464
- "rstrip": false,
465
- "single_word": false,
466
- "special": true
467
- },
468
- "32055": {
469
- "content": "<extra_id_44>",
470
- "lstrip": false,
471
- "normalized": false,
472
- "rstrip": false,
473
- "single_word": false,
474
- "special": true
475
- },
476
- "32056": {
477
- "content": "<extra_id_43>",
478
- "lstrip": false,
479
- "normalized": false,
480
- "rstrip": false,
481
- "single_word": false,
482
- "special": true
483
- },
484
- "32057": {
485
- "content": "<extra_id_42>",
486
- "lstrip": false,
487
- "normalized": false,
488
- "rstrip": false,
489
- "single_word": false,
490
- "special": true
491
- },
492
- "32058": {
493
- "content": "<extra_id_41>",
494
- "lstrip": false,
495
- "normalized": false,
496
- "rstrip": false,
497
- "single_word": false,
498
- "special": true
499
- },
500
- "32059": {
501
- "content": "<extra_id_40>",
502
- "lstrip": false,
503
- "normalized": false,
504
- "rstrip": false,
505
- "single_word": false,
506
- "special": true
507
- },
508
- "32060": {
509
- "content": "<extra_id_39>",
510
- "lstrip": false,
511
- "normalized": false,
512
- "rstrip": false,
513
- "single_word": false,
514
- "special": true
515
- },
516
- "32061": {
517
- "content": "<extra_id_38>",
518
- "lstrip": false,
519
- "normalized": false,
520
- "rstrip": false,
521
- "single_word": false,
522
- "special": true
523
- },
524
- "32062": {
525
- "content": "<extra_id_37>",
526
- "lstrip": false,
527
- "normalized": false,
528
- "rstrip": false,
529
- "single_word": false,
530
- "special": true
531
- },
532
- "32063": {
533
- "content": "<extra_id_36>",
534
- "lstrip": false,
535
- "normalized": false,
536
- "rstrip": false,
537
- "single_word": false,
538
- "special": true
539
- },
540
- "32064": {
541
- "content": "<extra_id_35>",
542
- "lstrip": false,
543
- "normalized": false,
544
- "rstrip": false,
545
- "single_word": false,
546
- "special": true
547
- },
548
- "32065": {
549
- "content": "<extra_id_34>",
550
- "lstrip": false,
551
- "normalized": false,
552
- "rstrip": false,
553
- "single_word": false,
554
- "special": true
555
- },
556
- "32066": {
557
- "content": "<extra_id_33>",
558
- "lstrip": false,
559
- "normalized": false,
560
- "rstrip": false,
561
- "single_word": false,
562
- "special": true
563
- },
564
- "32067": {
565
- "content": "<extra_id_32>",
566
- "lstrip": false,
567
- "normalized": false,
568
- "rstrip": false,
569
- "single_word": false,
570
- "special": true
571
- },
572
- "32068": {
573
- "content": "<extra_id_31>",
574
- "lstrip": false,
575
- "normalized": false,
576
- "rstrip": false,
577
- "single_word": false,
578
- "special": true
579
- },
580
- "32069": {
581
- "content": "<extra_id_30>",
582
- "lstrip": false,
583
- "normalized": false,
584
- "rstrip": false,
585
- "single_word": false,
586
- "special": true
587
- },
588
- "32070": {
589
- "content": "<extra_id_29>",
590
- "lstrip": false,
591
- "normalized": false,
592
- "rstrip": false,
593
- "single_word": false,
594
- "special": true
595
- },
596
- "32071": {
597
- "content": "<extra_id_28>",
598
- "lstrip": false,
599
- "normalized": false,
600
- "rstrip": false,
601
- "single_word": false,
602
- "special": true
603
- },
604
- "32072": {
605
- "content": "<extra_id_27>",
606
- "lstrip": false,
607
- "normalized": false,
608
- "rstrip": false,
609
- "single_word": false,
610
- "special": true
611
- },
612
- "32073": {
613
- "content": "<extra_id_26>",
614
- "lstrip": false,
615
- "normalized": false,
616
- "rstrip": false,
617
- "single_word": false,
618
- "special": true
619
- },
620
- "32074": {
621
- "content": "<extra_id_25>",
622
- "lstrip": false,
623
- "normalized": false,
624
- "rstrip": false,
625
- "single_word": false,
626
- "special": true
627
- },
628
- "32075": {
629
- "content": "<extra_id_24>",
630
- "lstrip": false,
631
- "normalized": false,
632
- "rstrip": false,
633
- "single_word": false,
634
- "special": true
635
- },
636
- "32076": {
637
- "content": "<extra_id_23>",
638
- "lstrip": false,
639
- "normalized": false,
640
- "rstrip": false,
641
- "single_word": false,
642
- "special": true
643
- },
644
- "32077": {
645
- "content": "<extra_id_22>",
646
- "lstrip": false,
647
- "normalized": false,
648
- "rstrip": false,
649
- "single_word": false,
650
- "special": true
651
- },
652
- "32078": {
653
- "content": "<extra_id_21>",
654
- "lstrip": false,
655
- "normalized": false,
656
- "rstrip": false,
657
- "single_word": false,
658
- "special": true
659
- },
660
- "32079": {
661
- "content": "<extra_id_20>",
662
- "lstrip": false,
663
- "normalized": false,
664
- "rstrip": false,
665
- "single_word": false,
666
- "special": true
667
- },
668
- "32080": {
669
- "content": "<extra_id_19>",
670
- "lstrip": false,
671
- "normalized": false,
672
- "rstrip": false,
673
- "single_word": false,
674
- "special": true
675
- },
676
- "32081": {
677
- "content": "<extra_id_18>",
678
- "lstrip": false,
679
- "normalized": false,
680
- "rstrip": false,
681
- "single_word": false,
682
- "special": true
683
- },
684
- "32082": {
685
- "content": "<extra_id_17>",
686
- "lstrip": false,
687
- "normalized": false,
688
- "rstrip": false,
689
- "single_word": false,
690
- "special": true
691
- },
692
- "32083": {
693
- "content": "<extra_id_16>",
694
- "lstrip": false,
695
- "normalized": false,
696
- "rstrip": false,
697
- "single_word": false,
698
- "special": true
699
- },
700
- "32084": {
701
- "content": "<extra_id_15>",
702
- "lstrip": false,
703
- "normalized": false,
704
- "rstrip": false,
705
- "single_word": false,
706
- "special": true
707
- },
708
- "32085": {
709
- "content": "<extra_id_14>",
710
- "lstrip": false,
711
- "normalized": false,
712
- "rstrip": false,
713
- "single_word": false,
714
- "special": true
715
- },
716
- "32086": {
717
- "content": "<extra_id_13>",
718
- "lstrip": false,
719
- "normalized": false,
720
- "rstrip": false,
721
- "single_word": false,
722
- "special": true
723
- },
724
- "32087": {
725
- "content": "<extra_id_12>",
726
- "lstrip": false,
727
- "normalized": false,
728
- "rstrip": false,
729
- "single_word": false,
730
- "special": true
731
- },
732
- "32088": {
733
- "content": "<extra_id_11>",
734
- "lstrip": false,
735
- "normalized": false,
736
- "rstrip": false,
737
- "single_word": false,
738
- "special": true
739
- },
740
- "32089": {
741
- "content": "<extra_id_10>",
742
- "lstrip": false,
743
- "normalized": false,
744
- "rstrip": false,
745
- "single_word": false,
746
- "special": true
747
- },
748
- "32090": {
749
- "content": "<extra_id_9>",
750
- "lstrip": false,
751
- "normalized": false,
752
- "rstrip": false,
753
- "single_word": false,
754
- "special": true
755
- },
756
- "32091": {
757
- "content": "<extra_id_8>",
758
- "lstrip": false,
759
- "normalized": false,
760
- "rstrip": false,
761
- "single_word": false,
762
- "special": true
763
- },
764
- "32092": {
765
- "content": "<extra_id_7>",
766
- "lstrip": false,
767
- "normalized": false,
768
- "rstrip": false,
769
- "single_word": false,
770
- "special": true
771
- },
772
- "32093": {
773
- "content": "<extra_id_6>",
774
- "lstrip": false,
775
- "normalized": false,
776
- "rstrip": false,
777
- "single_word": false,
778
- "special": true
779
- },
780
- "32094": {
781
- "content": "<extra_id_5>",
782
- "lstrip": false,
783
- "normalized": false,
784
- "rstrip": false,
785
- "single_word": false,
786
- "special": true
787
- },
788
- "32095": {
789
- "content": "<extra_id_4>",
790
- "lstrip": false,
791
- "normalized": false,
792
- "rstrip": false,
793
- "single_word": false,
794
- "special": true
795
- },
796
- "32096": {
797
- "content": "<extra_id_3>",
798
- "lstrip": false,
799
- "normalized": false,
800
- "rstrip": false,
801
- "single_word": false,
802
- "special": true
803
- },
804
- "32097": {
805
- "content": "<extra_id_2>",
806
- "lstrip": false,
807
- "normalized": false,
808
- "rstrip": false,
809
- "single_word": false,
810
- "special": true
811
- },
812
- "32098": {
813
- "content": "<extra_id_1>",
814
- "lstrip": false,
815
- "normalized": false,
816
- "rstrip": false,
817
- "single_word": false,
818
- "special": true
819
- },
820
- "32099": {
821
- "content": "<extra_id_0>",
822
- "lstrip": false,
823
- "normalized": false,
824
- "rstrip": false,
825
- "single_word": false,
826
- "special": true
827
- }
828
- },
829
- "additional_special_tokens": [
830
- "<extra_id_0>",
831
- "<extra_id_1>",
832
- "<extra_id_2>",
833
- "<extra_id_3>",
834
- "<extra_id_4>",
835
- "<extra_id_5>",
836
- "<extra_id_6>",
837
- "<extra_id_7>",
838
- "<extra_id_8>",
839
- "<extra_id_9>",
840
- "<extra_id_10>",
841
- "<extra_id_11>",
842
- "<extra_id_12>",
843
- "<extra_id_13>",
844
- "<extra_id_14>",
845
- "<extra_id_15>",
846
- "<extra_id_16>",
847
- "<extra_id_17>",
848
- "<extra_id_18>",
849
- "<extra_id_19>",
850
- "<extra_id_20>",
851
- "<extra_id_21>",
852
- "<extra_id_22>",
853
- "<extra_id_23>",
854
- "<extra_id_24>",
855
- "<extra_id_25>",
856
- "<extra_id_26>",
857
- "<extra_id_27>",
858
- "<extra_id_28>",
859
- "<extra_id_29>",
860
- "<extra_id_30>",
861
- "<extra_id_31>",
862
- "<extra_id_32>",
863
- "<extra_id_33>",
864
- "<extra_id_34>",
865
- "<extra_id_35>",
866
- "<extra_id_36>",
867
- "<extra_id_37>",
868
- "<extra_id_38>",
869
- "<extra_id_39>",
870
- "<extra_id_40>",
871
- "<extra_id_41>",
872
- "<extra_id_42>",
873
- "<extra_id_43>",
874
- "<extra_id_44>",
875
- "<extra_id_45>",
876
- "<extra_id_46>",
877
- "<extra_id_47>",
878
- "<extra_id_48>",
879
- "<extra_id_49>",
880
- "<extra_id_50>",
881
- "<extra_id_51>",
882
- "<extra_id_52>",
883
- "<extra_id_53>",
884
- "<extra_id_54>",
885
- "<extra_id_55>",
886
- "<extra_id_56>",
887
- "<extra_id_57>",
888
- "<extra_id_58>",
889
- "<extra_id_59>",
890
- "<extra_id_60>",
891
- "<extra_id_61>",
892
- "<extra_id_62>",
893
- "<extra_id_63>",
894
- "<extra_id_64>",
895
- "<extra_id_65>",
896
- "<extra_id_66>",
897
- "<extra_id_67>",
898
- "<extra_id_68>",
899
- "<extra_id_69>",
900
- "<extra_id_70>",
901
- "<extra_id_71>",
902
- "<extra_id_72>",
903
- "<extra_id_73>",
904
- "<extra_id_74>",
905
- "<extra_id_75>",
906
- "<extra_id_76>",
907
- "<extra_id_77>",
908
- "<extra_id_78>",
909
- "<extra_id_79>",
910
- "<extra_id_80>",
911
- "<extra_id_81>",
912
- "<extra_id_82>",
913
- "<extra_id_83>",
914
- "<extra_id_84>",
915
- "<extra_id_85>",
916
- "<extra_id_86>",
917
- "<extra_id_87>",
918
- "<extra_id_88>",
919
- "<extra_id_89>",
920
- "<extra_id_90>",
921
- "<extra_id_91>",
922
- "<extra_id_92>",
923
- "<extra_id_93>",
924
- "<extra_id_94>",
925
- "<extra_id_95>",
926
- "<extra_id_96>",
927
- "<extra_id_97>",
928
- "<extra_id_98>",
929
- "<extra_id_99>"
930
- ],
931
- "clean_up_tokenization_spaces": true,
932
- "eos_token": "</s>",
933
- "extra_ids": 100,
934
- "extra_special_tokens": {},
935
- "model_max_length": 512,
936
- "pad_token": "<pad>",
937
- "tokenizer_class": "T5Tokenizer",
938
- "unk_token": "<unk>"
939
- }