Text Classification
Transformers
TensorBoard
Safetensors
modernbert
wissamantoun commited on
Commit
7983357
·
verified ·
1 Parent(s): 59857ec

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - WebOrganizer/TopicAnnotations-Llama-3.1-8B
5
+ - WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8
6
+ base_model:
7
+ - answerdotai/ModernBERT-base
8
+ ---
9
+ # wissamantoun/WebOrganizer-TopicClassifier-ModernBERT
10
+
11
+ [[Paper](https://arxiv.org/abs/2502.10341)] [[Website](https://weborganizer.allenai.org)] [[GitHub](https://github.com/CodeCreator/WebOrganizer)]
12
+
13
+ *All credit goes to the original authors of the model and dataset. This is a retraining of the original model with a different base model*
14
+
15
+ The TopicClassifier organizes web content into 17 categories based on the URL and text contents of web pages.
16
+ The model is a [ModernBERT-base](answerdotai/ModernBERT-base) with 140M parameters fine-tuned on the following training data:
17
+
18
+ 1. [WebOrganizer/TopicAnnotations-Llama-3.1-8B](https://huggingface.co/datasets/WebOrganizer/TopicAnnotations-Llama-3.1-8B): 1M documents annotated by Llama-3.1-8B (first-stage training)
19
+ 2. [WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8](https://huggingface.co/datasets/WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8): 100K documents annotated by Llama-3.1-405B-FP8 (second-stage training)
20
+
21
+ #### All Domain Classifiers
22
+ - [wissamantoun/WebOrganizer-FormatClassifier-ModernBERT](https://huggingface.co/wissamantoun/WebOrganizer-FormatClassifier-ModernBERT)
23
+ - [wissamantoun/WebOrganizer-TopicClassifier-ModernBERT](https://huggingface.co/wissamantoun/WebOrganizer-TopicClassifier-ModernBERT) *← you are here!*
24
+
25
+ ## Usage
26
+
27
+ This classifier expects input in the following input format:
28
+ ```
29
+ {url}
30
+
31
+ {text}
32
+ ```
33
+
34
+ Example:
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained("wissamantoun/wissamantoun/WebOrganizer-TopicClassifier-ModernBERT")
39
+ model = AutoModelForSequenceClassification.from_pretrained(
40
+ "wissamantoun/wissamantoun/WebOrganizer-TopicClassifier-ModernBERT",
41
+ trust_remote_code=True,
42
+ use_memory_efficient_attention=False)
43
+
44
+ web_page = """http://www.example.com
45
+
46
+ How to build a computer from scratch? Here are the components you need..."""
47
+
48
+ inputs = tokenizer([web_page], return_tensors="pt")
49
+ outputs = model(**inputs)
50
+
51
+ probs = outputs.logits.softmax(dim=-1)
52
+ print(probs.argmax(dim=-1))
53
+ # -> 5 ("Hardware" topic)
54
+ ```
55
+
56
+ You can convert the `logits` of the model with a softmax to obtain a probability distribution over the following 24 categories (in order of labels, also see `id2label` and `label2id` in the model config):
57
+ 1. Adult
58
+ 2. Art & Design
59
+ 3. Software Dev.
60
+ 4. Crime & Law
61
+ 5. Education & Jobs
62
+ 6. Hardware
63
+ 7. Entertainment
64
+ 8. Social Life
65
+ 9. Fashion & Beauty
66
+ 10. Finance & Business
67
+ 11. Food & Dining
68
+ 12. Games
69
+ 13. Health
70
+ 14. History
71
+ 15. Home & Hobbies
72
+ 16. Industrial
73
+ 17. Literature
74
+ 18. Politics
75
+ 19. Religion
76
+ 20. Science & Tech.
77
+ 21. Software
78
+ 22. Sports & Fitness
79
+ 23. Transportation
80
+ 24. Travel
81
+
82
+ The full definitions of the categories can be found in the [taxonomy config](https://github.com/CodeCreator/WebOrganizer/blob/main/define_domains/taxonomies/topics.yaml).
83
+
84
+ # Scores
85
+ ```
86
+ ***** pred metrics *****
87
+ test_accuracy = 0.8585
88
+ test_accuracy__0 = 0.9346
89
+ test_accuracy__1 = 0.7317
90
+ test_accuracy__10 = 0.9148
91
+ test_accuracy__11 = 0.8927
92
+ test_accuracy__12 = 0.8687
93
+ test_accuracy__13 = 0.814
94
+ test_accuracy__14 = 0.8616
95
+ test_accuracy__15 = 0.7179
96
+ test_accuracy__16 = 0.855
97
+ test_accuracy__17 = 0.8246
98
+ test_accuracy__18 = 0.907
99
+ test_accuracy__19 = 0.8333
100
+ test_accuracy__2 = 0.866
101
+ test_accuracy__20 = 0.8294
102
+ test_accuracy__21 = 0.9441
103
+ test_accuracy__22 = 0.8788
104
+ test_accuracy__23 = 0.9
105
+ test_accuracy__3 = 0.847
106
+ test_accuracy__4 = 0.8442
107
+ test_accuracy__5 = 0.8189
108
+ test_accuracy__6 = 0.8997
109
+ test_accuracy__7 = 0.7295
110
+ test_accuracy__8 = 0.8937
111
+ test_accuracy__9 = 0.8665
112
+ test_accuracy_conf50 = 0.8674
113
+ test_accuracy_conf50__0 = 0.9434
114
+ test_accuracy_conf50__1 = 0.7453
115
+ test_accuracy_conf50__10 = 0.93
116
+ test_accuracy_conf50__11 = 0.8958
117
+ test_accuracy_conf50__12 = 0.8768
118
+ test_accuracy_conf50__13 = 0.8193
119
+ test_accuracy_conf50__14 = 0.8691
120
+ test_accuracy_conf50__15 = 0.7237
121
+ test_accuracy_conf50__16 = 0.864
122
+ test_accuracy_conf50__17 = 0.8358
123
+ test_accuracy_conf50__18 = 0.91
124
+ test_accuracy_conf50__19 = 0.8481
125
+ test_accuracy_conf50__2 = 0.8768
126
+ test_accuracy_conf50__20 = 0.8434
127
+ test_accuracy_conf50__21 = 0.9505
128
+ test_accuracy_conf50__22 = 0.8844
129
+ test_accuracy_conf50__23 = 0.9028
130
+ test_accuracy_conf50__3 = 0.8571
131
+ test_accuracy_conf50__4 = 0.851
132
+ test_accuracy_conf50__5 = 0.8206
133
+ test_accuracy_conf50__6 = 0.9071
134
+ test_accuracy_conf50__7 = 0.7442
135
+ test_accuracy_conf50__8 = 0.9006
136
+ test_accuracy_conf50__9 = 0.8761
137
+ test_accuracy_conf75 = 0.9178 <--- Metric from the paper
138
+ test_accuracy_conf75__0 = 0.95
139
+ test_accuracy_conf75__1 = 0.8413
140
+ test_accuracy_conf75__10 = 0.9556
141
+ test_accuracy_conf75__11 = 0.9298
142
+ test_accuracy_conf75__12 = 0.9299
143
+ test_accuracy_conf75__13 = 0.8788
144
+ test_accuracy_conf75__14 = 0.9126
145
+ test_accuracy_conf75__15 = 0.8253
146
+ test_accuracy_conf75__16 = 0.8885
147
+ test_accuracy_conf75__17 = 0.8968
148
+ test_accuracy_conf75__18 = 0.938
149
+ test_accuracy_conf75__19 = 0.9113
150
+ test_accuracy_conf75__2 = 0.9029
151
+ test_accuracy_conf75__20 = 0.8966
152
+ test_accuracy_conf75__21 = 0.968
153
+ test_accuracy_conf75__22 = 0.9225
154
+ test_accuracy_conf75__23 = 0.9444
155
+ test_accuracy_conf75__3 = 0.9319
156
+ test_accuracy_conf75__4 = 0.8976
157
+ test_accuracy_conf75__5 = 0.9167
158
+ test_accuracy_conf75__6 = 0.9483
159
+ test_accuracy_conf75__7 = 0.804
160
+ test_accuracy_conf75__8 = 0.9448
161
+ test_accuracy_conf75__9 = 0.932
162
+ test_accuracy_label_average = 0.8531
163
+ test_accuracy_label_average_conf50 = 0.8615
164
+ test_accuracy_label_average_conf75 = 0.9111
165
+ test_accuracy_label_min = 0.7179
166
+ test_accuracy_label_min_conf50 = 0.7237
167
+ test_accuracy_label_min_conf75 = 0.804 <--- Metric from the paper
168
+ test_loss = 0.4694
169
+ test_proportion_conf50 = 0.9811
170
+ test_proportion_conf75 = 0.8535
171
+ test_runtime = 0:00:08.39
172
+ test_samples_per_second = 1191.144
173
+ test_steps_per_second = 37.283
174
+ ```
175
+
176
+
177
+
178
+ ## Citation
179
+ ```bibtex
180
+ @article{wettig2025organize,
181
+ title={Organize the Web: Constructing Domains Enhances Pre-Training Data Curation},
182
+ author={Alexander Wettig and Kyle Lo and Sewon Min and Hannaneh Hajishirzi and Danqi Chen and Luca Soldaini},
183
+ journal={arXiv preprint arXiv:2502.10341},
184
+ year={2025}
185
+ }
186
+ ```
all_results.json ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "eval_validation.parquet_accuracy": 0.8558,
4
+ "eval_validation.parquet_accuracy__0": 0.8627450980392157,
5
+ "eval_validation.parquet_accuracy__1": 0.7607361963190185,
6
+ "eval_validation.parquet_accuracy__10": 0.9035294117647059,
7
+ "eval_validation.parquet_accuracy__11": 0.8764478764478765,
8
+ "eval_validation.parquet_accuracy__12": 0.8980667838312829,
9
+ "eval_validation.parquet_accuracy__13": 0.7473118279569892,
10
+ "eval_validation.parquet_accuracy__14": 0.839943342776204,
11
+ "eval_validation.parquet_accuracy__15": 0.8427947598253275,
12
+ "eval_validation.parquet_accuracy__16": 0.830945558739255,
13
+ "eval_validation.parquet_accuracy__17": 0.839527027027027,
14
+ "eval_validation.parquet_accuracy__18": 0.8801369863013698,
15
+ "eval_validation.parquet_accuracy__19": 0.8117647058823529,
16
+ "eval_validation.parquet_accuracy__2": 0.7860696517412935,
17
+ "eval_validation.parquet_accuracy__20": 0.7633802816901408,
18
+ "eval_validation.parquet_accuracy__21": 0.9289617486338798,
19
+ "eval_validation.parquet_accuracy__22": 0.8562691131498471,
20
+ "eval_validation.parquet_accuracy__23": 0.8805460750853242,
21
+ "eval_validation.parquet_accuracy__3": 0.8660714285714286,
22
+ "eval_validation.parquet_accuracy__4": 0.8530884808013356,
23
+ "eval_validation.parquet_accuracy__5": 0.8694029850746269,
24
+ "eval_validation.parquet_accuracy__6": 0.900117508813161,
25
+ "eval_validation.parquet_accuracy__7": 0.7741935483870968,
26
+ "eval_validation.parquet_accuracy__8": 0.8904494382022472,
27
+ "eval_validation.parquet_accuracy__9": 0.8669833729216152,
28
+ "eval_validation.parquet_accuracy_conf50": 0.8663669799754802,
29
+ "eval_validation.parquet_accuracy_conf50__0": 0.8712871287128713,
30
+ "eval_validation.parquet_accuracy_conf50__1": 0.7711598746081505,
31
+ "eval_validation.parquet_accuracy_conf50__10": 0.9078014184397163,
32
+ "eval_validation.parquet_accuracy_conf50__11": 0.8823529411764706,
33
+ "eval_validation.parquet_accuracy_conf50__12": 0.9089285714285714,
34
+ "eval_validation.parquet_accuracy_conf50__13": 0.7828571428571428,
35
+ "eval_validation.parquet_accuracy_conf50__14": 0.8511560693641619,
36
+ "eval_validation.parquet_accuracy_conf50__15": 0.8565022421524664,
37
+ "eval_validation.parquet_accuracy_conf50__16": 0.8357771260997068,
38
+ "eval_validation.parquet_accuracy_conf50__17": 0.8456260720411664,
39
+ "eval_validation.parquet_accuracy_conf50__18": 0.8858131487889274,
40
+ "eval_validation.parquet_accuracy_conf50__19": 0.8277945619335347,
41
+ "eval_validation.parquet_accuracy_conf50__2": 0.8041237113402062,
42
+ "eval_validation.parquet_accuracy_conf50__20": 0.7794117647058824,
43
+ "eval_validation.parquet_accuracy_conf50__21": 0.9312242090784044,
44
+ "eval_validation.parquet_accuracy_conf50__22": 0.8710691823899371,
45
+ "eval_validation.parquet_accuracy_conf50__23": 0.8797250859106529,
46
+ "eval_validation.parquet_accuracy_conf50__3": 0.8885448916408669,
47
+ "eval_validation.parquet_accuracy_conf50__4": 0.8637137989778535,
48
+ "eval_validation.parquet_accuracy_conf50__5": 0.8816793893129771,
49
+ "eval_validation.parquet_accuracy_conf50__6": 0.9115890083632019,
50
+ "eval_validation.parquet_accuracy_conf50__7": 0.7850678733031674,
51
+ "eval_validation.parquet_accuracy_conf50__8": 0.8923512747875354,
52
+ "eval_validation.parquet_accuracy_conf50__9": 0.878345498783455,
53
+ "eval_validation.parquet_accuracy_conf75": 0.9145129224652088,
54
+ "eval_validation.parquet_accuracy_conf75__0": 0.9239130434782609,
55
+ "eval_validation.parquet_accuracy_conf75__1": 0.85546875,
56
+ "eval_validation.parquet_accuracy_conf75__10": 0.9493670886075949,
57
+ "eval_validation.parquet_accuracy_conf75__11": 0.9121338912133892,
58
+ "eval_validation.parquet_accuracy_conf75__12": 0.9450980392156862,
59
+ "eval_validation.parquet_accuracy_conf75__13": 0.8294573643410853,
60
+ "eval_validation.parquet_accuracy_conf75__14": 0.9129692832764505,
61
+ "eval_validation.parquet_accuracy_conf75__15": 0.9392265193370166,
62
+ "eval_validation.parquet_accuracy_conf75__16": 0.8972602739726028,
63
+ "eval_validation.parquet_accuracy_conf75__17": 0.8893280632411067,
64
+ "eval_validation.parquet_accuracy_conf75__18": 0.9288389513108615,
65
+ "eval_validation.parquet_accuracy_conf75__19": 0.8925925925925926,
66
+ "eval_validation.parquet_accuracy_conf75__2": 0.8809523809523809,
67
+ "eval_validation.parquet_accuracy_conf75__20": 0.8479087452471483,
68
+ "eval_validation.parquet_accuracy_conf75__21": 0.9502923976608187,
69
+ "eval_validation.parquet_accuracy_conf75__22": 0.9064748201438849,
70
+ "eval_validation.parquet_accuracy_conf75__23": 0.9404761904761905,
71
+ "eval_validation.parquet_accuracy_conf75__3": 0.9228070175438596,
72
+ "eval_validation.parquet_accuracy_conf75__4": 0.9067961165048544,
73
+ "eval_validation.parquet_accuracy_conf75__5": 0.9224137931034483,
74
+ "eval_validation.parquet_accuracy_conf75__6": 0.9374185136897001,
75
+ "eval_validation.parquet_accuracy_conf75__7": 0.8453333333333334,
76
+ "eval_validation.parquet_accuracy_conf75__8": 0.9276729559748428,
77
+ "eval_validation.parquet_accuracy_conf75__9": 0.9305354558610709,
78
+ "eval_validation.parquet_accuracy_label_average": 0.8470618003326092,
79
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8580792494248763,
80
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9081139825449239,
81
+ "eval_validation.parquet_accuracy_label_min": 0.7473118279569892,
82
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7711598746081505,
83
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8294573643410853,
84
+ "eval_validation.parquet_loss": 0.4807276427745819,
85
+ "eval_validation.parquet_proportion_conf50": 0.9788,
86
+ "eval_validation.parquet_proportion_conf75": 0.8551,
87
+ "eval_validation.parquet_runtime": 8.4462,
88
+ "eval_validation.parquet_samples_per_second": 1183.97,
89
+ "eval_validation.parquet_steps_per_second": 37.058,
90
+ "num_input_tokens_seen": 1949274656,
91
+ "test_accuracy": 0.8585,
92
+ "test_accuracy__0": 0.9345794392523364,
93
+ "test_accuracy__1": 0.7317073170731707,
94
+ "test_accuracy__10": 0.9148351648351648,
95
+ "test_accuracy__11": 0.89272030651341,
96
+ "test_accuracy__12": 0.8687196110210696,
97
+ "test_accuracy__13": 0.813953488372093,
98
+ "test_accuracy__14": 0.8615819209039548,
99
+ "test_accuracy__15": 0.717948717948718,
100
+ "test_accuracy__16": 0.8550295857988166,
101
+ "test_accuracy__17": 0.8245931283905967,
102
+ "test_accuracy__18": 0.9069767441860465,
103
+ "test_accuracy__19": 0.8333333333333334,
104
+ "test_accuracy__2": 0.8660287081339713,
105
+ "test_accuracy__20": 0.8294117647058824,
106
+ "test_accuracy__21": 0.944141689373297,
107
+ "test_accuracy__22": 0.8787878787878788,
108
+ "test_accuracy__23": 0.9,
109
+ "test_accuracy__3": 0.8470254957507082,
110
+ "test_accuracy__4": 0.8442367601246106,
111
+ "test_accuracy__5": 0.8188679245283019,
112
+ "test_accuracy__6": 0.8996655518394648,
113
+ "test_accuracy__7": 0.729490022172949,
114
+ "test_accuracy__8": 0.8937329700272479,
115
+ "test_accuracy__9": 0.8665105386416861,
116
+ "test_accuracy_conf50": 0.8673937417184793,
117
+ "test_accuracy_conf50__0": 0.9433962264150944,
118
+ "test_accuracy_conf50__1": 0.7452830188679245,
119
+ "test_accuracy_conf50__10": 0.9299719887955182,
120
+ "test_accuracy_conf50__11": 0.8957528957528957,
121
+ "test_accuracy_conf50__12": 0.8768472906403941,
122
+ "test_accuracy_conf50__13": 0.8192771084337349,
123
+ "test_accuracy_conf50__14": 0.8690647482014389,
124
+ "test_accuracy_conf50__15": 0.7236842105263158,
125
+ "test_accuracy_conf50__16": 0.8640483383685801,
126
+ "test_accuracy_conf50__17": 0.8357933579335793,
127
+ "test_accuracy_conf50__18": 0.91,
128
+ "test_accuracy_conf50__19": 0.8480565371024735,
129
+ "test_accuracy_conf50__2": 0.8768472906403941,
130
+ "test_accuracy_conf50__20": 0.8433734939759037,
131
+ "test_accuracy_conf50__21": 0.9504814305364512,
132
+ "test_accuracy_conf50__22": 0.8843537414965986,
133
+ "test_accuracy_conf50__23": 0.9028213166144201,
134
+ "test_accuracy_conf50__3": 0.8571428571428571,
135
+ "test_accuracy_conf50__4": 0.8510301109350238,
136
+ "test_accuracy_conf50__5": 0.8206106870229007,
137
+ "test_accuracy_conf50__6": 0.9071347678369196,
138
+ "test_accuracy_conf50__7": 0.7441860465116279,
139
+ "test_accuracy_conf50__8": 0.9005524861878453,
140
+ "test_accuracy_conf50__9": 0.8760529482551144,
141
+ "test_accuracy_conf75": 0.917750439367311,
142
+ "test_accuracy_conf75__0": 0.95,
143
+ "test_accuracy_conf75__1": 0.8412698412698413,
144
+ "test_accuracy_conf75__10": 0.9556213017751479,
145
+ "test_accuracy_conf75__11": 0.9297520661157025,
146
+ "test_accuracy_conf75__12": 0.9298892988929889,
147
+ "test_accuracy_conf75__13": 0.8787878787878788,
148
+ "test_accuracy_conf75__14": 0.9126050420168067,
149
+ "test_accuracy_conf75__15": 0.8253012048192772,
150
+ "test_accuracy_conf75__16": 0.8885017421602788,
151
+ "test_accuracy_conf75__17": 0.8968421052631579,
152
+ "test_accuracy_conf75__18": 0.9379562043795621,
153
+ "test_accuracy_conf75__19": 0.9112903225806451,
154
+ "test_accuracy_conf75__2": 0.9028571428571428,
155
+ "test_accuracy_conf75__20": 0.896551724137931,
156
+ "test_accuracy_conf75__21": 0.9680232558139535,
157
+ "test_accuracy_conf75__22": 0.9224806201550387,
158
+ "test_accuracy_conf75__23": 0.9444444444444444,
159
+ "test_accuracy_conf75__3": 0.931899641577061,
160
+ "test_accuracy_conf75__4": 0.8976234003656307,
161
+ "test_accuracy_conf75__5": 0.9166666666666666,
162
+ "test_accuracy_conf75__6": 0.9482976040353089,
163
+ "test_accuracy_conf75__7": 0.8040345821325648,
164
+ "test_accuracy_conf75__8": 0.9447852760736196,
165
+ "test_accuracy_conf75__9": 0.9320113314447592,
166
+ "test_accuracy_label_average": 0.853078252571446,
167
+ "test_accuracy_label_average_conf50": 0.8614901207580835,
168
+ "test_accuracy_label_average_conf75": 0.9111455290735587,
169
+ "test_accuracy_label_min": 0.717948717948718,
170
+ "test_accuracy_label_min_conf50": 0.7236842105263158,
171
+ "test_accuracy_label_min_conf75": 0.8040345821325648,
172
+ "test_loss": 0.4694322645664215,
173
+ "test_proportion_conf50": 0.9811,
174
+ "test_proportion_conf75": 0.8535,
175
+ "test_runtime": 8.3953,
176
+ "test_samples_per_second": 1191.144,
177
+ "test_steps_per_second": 37.283,
178
+ "train_loss": 1.6634563641670423,
179
+ "train_runtime": 573.9155,
180
+ "train_samples_per_second": 696.967,
181
+ "train_steps_per_second": 1.359
182
+ }
config.json ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 50282,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000.0,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "Adult",
24
+ "1": "Art & Design",
25
+ "10": "Food & Dining",
26
+ "11": "Games",
27
+ "12": "Health",
28
+ "13": "History",
29
+ "14": "Home & Hobbies",
30
+ "15": "Industrial",
31
+ "16": "Literature",
32
+ "17": "Politics",
33
+ "18": "Religion",
34
+ "19": "Science & Tech.",
35
+ "2": "Software Dev.",
36
+ "20": "Software",
37
+ "21": "Sports & Fitness",
38
+ "22": "Transportation",
39
+ "23": "Travel",
40
+ "3": "Crime & Law",
41
+ "4": "Education & Jobs",
42
+ "5": "Hardware",
43
+ "6": "Entertainment",
44
+ "7": "Social Life",
45
+ "8": "Fashion & Beauty",
46
+ "9": "Finance & Business"
47
+ },
48
+ "initializer_cutoff_factor": 2.0,
49
+ "initializer_range": 0.02,
50
+ "intermediate_size": 1152,
51
+ "label2id": {
52
+ "Adult": 0,
53
+ "Art & Design": 1,
54
+ "Crime & Law": 3,
55
+ "Education & Jobs": 4,
56
+ "Entertainment": 6,
57
+ "Fashion & Beauty": 8,
58
+ "Finance & Business": 9,
59
+ "Food & Dining": 10,
60
+ "Games": 11,
61
+ "Hardware": 5,
62
+ "Health": 12,
63
+ "History": 13,
64
+ "Home & Hobbies": 14,
65
+ "Industrial": 15,
66
+ "Literature": 16,
67
+ "Politics": 17,
68
+ "Religion": 18,
69
+ "Science & Tech.": 19,
70
+ "Social Life": 7,
71
+ "Software": 20,
72
+ "Software Dev.": 2,
73
+ "Sports & Fitness": 21,
74
+ "Transportation": 22,
75
+ "Travel": 23
76
+ },
77
+ "layer_norm_eps": 1e-05,
78
+ "local_attention": 128,
79
+ "local_rope_theta": 10000.0,
80
+ "max_position_embeddings": 8192,
81
+ "mlp_bias": false,
82
+ "mlp_dropout": 0.0,
83
+ "model_type": "modernbert",
84
+ "norm_bias": false,
85
+ "norm_eps": 1e-05,
86
+ "num_attention_heads": 12,
87
+ "num_hidden_layers": 22,
88
+ "pad_token_id": 50283,
89
+ "position_embedding_type": "absolute",
90
+ "reference_compile": true,
91
+ "repad_logits_with_grad": false,
92
+ "sep_token_id": 50282,
93
+ "sparse_pred_ignore_index": -100,
94
+ "sparse_prediction": false,
95
+ "torch_dtype": "bfloat16",
96
+ "transformers_version": "4.50.0",
97
+ "vocab_size": 50368
98
+ }
eval_results.json ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "eval_validation.parquet_accuracy": 0.8558,
4
+ "eval_validation.parquet_accuracy__0": 0.8627450980392157,
5
+ "eval_validation.parquet_accuracy__1": 0.7607361963190185,
6
+ "eval_validation.parquet_accuracy__10": 0.9035294117647059,
7
+ "eval_validation.parquet_accuracy__11": 0.8764478764478765,
8
+ "eval_validation.parquet_accuracy__12": 0.8980667838312829,
9
+ "eval_validation.parquet_accuracy__13": 0.7473118279569892,
10
+ "eval_validation.parquet_accuracy__14": 0.839943342776204,
11
+ "eval_validation.parquet_accuracy__15": 0.8427947598253275,
12
+ "eval_validation.parquet_accuracy__16": 0.830945558739255,
13
+ "eval_validation.parquet_accuracy__17": 0.839527027027027,
14
+ "eval_validation.parquet_accuracy__18": 0.8801369863013698,
15
+ "eval_validation.parquet_accuracy__19": 0.8117647058823529,
16
+ "eval_validation.parquet_accuracy__2": 0.7860696517412935,
17
+ "eval_validation.parquet_accuracy__20": 0.7633802816901408,
18
+ "eval_validation.parquet_accuracy__21": 0.9289617486338798,
19
+ "eval_validation.parquet_accuracy__22": 0.8562691131498471,
20
+ "eval_validation.parquet_accuracy__23": 0.8805460750853242,
21
+ "eval_validation.parquet_accuracy__3": 0.8660714285714286,
22
+ "eval_validation.parquet_accuracy__4": 0.8530884808013356,
23
+ "eval_validation.parquet_accuracy__5": 0.8694029850746269,
24
+ "eval_validation.parquet_accuracy__6": 0.900117508813161,
25
+ "eval_validation.parquet_accuracy__7": 0.7741935483870968,
26
+ "eval_validation.parquet_accuracy__8": 0.8904494382022472,
27
+ "eval_validation.parquet_accuracy__9": 0.8669833729216152,
28
+ "eval_validation.parquet_accuracy_conf50": 0.8663669799754802,
29
+ "eval_validation.parquet_accuracy_conf50__0": 0.8712871287128713,
30
+ "eval_validation.parquet_accuracy_conf50__1": 0.7711598746081505,
31
+ "eval_validation.parquet_accuracy_conf50__10": 0.9078014184397163,
32
+ "eval_validation.parquet_accuracy_conf50__11": 0.8823529411764706,
33
+ "eval_validation.parquet_accuracy_conf50__12": 0.9089285714285714,
34
+ "eval_validation.parquet_accuracy_conf50__13": 0.7828571428571428,
35
+ "eval_validation.parquet_accuracy_conf50__14": 0.8511560693641619,
36
+ "eval_validation.parquet_accuracy_conf50__15": 0.8565022421524664,
37
+ "eval_validation.parquet_accuracy_conf50__16": 0.8357771260997068,
38
+ "eval_validation.parquet_accuracy_conf50__17": 0.8456260720411664,
39
+ "eval_validation.parquet_accuracy_conf50__18": 0.8858131487889274,
40
+ "eval_validation.parquet_accuracy_conf50__19": 0.8277945619335347,
41
+ "eval_validation.parquet_accuracy_conf50__2": 0.8041237113402062,
42
+ "eval_validation.parquet_accuracy_conf50__20": 0.7794117647058824,
43
+ "eval_validation.parquet_accuracy_conf50__21": 0.9312242090784044,
44
+ "eval_validation.parquet_accuracy_conf50__22": 0.8710691823899371,
45
+ "eval_validation.parquet_accuracy_conf50__23": 0.8797250859106529,
46
+ "eval_validation.parquet_accuracy_conf50__3": 0.8885448916408669,
47
+ "eval_validation.parquet_accuracy_conf50__4": 0.8637137989778535,
48
+ "eval_validation.parquet_accuracy_conf50__5": 0.8816793893129771,
49
+ "eval_validation.parquet_accuracy_conf50__6": 0.9115890083632019,
50
+ "eval_validation.parquet_accuracy_conf50__7": 0.7850678733031674,
51
+ "eval_validation.parquet_accuracy_conf50__8": 0.8923512747875354,
52
+ "eval_validation.parquet_accuracy_conf50__9": 0.878345498783455,
53
+ "eval_validation.parquet_accuracy_conf75": 0.9145129224652088,
54
+ "eval_validation.parquet_accuracy_conf75__0": 0.9239130434782609,
55
+ "eval_validation.parquet_accuracy_conf75__1": 0.85546875,
56
+ "eval_validation.parquet_accuracy_conf75__10": 0.9493670886075949,
57
+ "eval_validation.parquet_accuracy_conf75__11": 0.9121338912133892,
58
+ "eval_validation.parquet_accuracy_conf75__12": 0.9450980392156862,
59
+ "eval_validation.parquet_accuracy_conf75__13": 0.8294573643410853,
60
+ "eval_validation.parquet_accuracy_conf75__14": 0.9129692832764505,
61
+ "eval_validation.parquet_accuracy_conf75__15": 0.9392265193370166,
62
+ "eval_validation.parquet_accuracy_conf75__16": 0.8972602739726028,
63
+ "eval_validation.parquet_accuracy_conf75__17": 0.8893280632411067,
64
+ "eval_validation.parquet_accuracy_conf75__18": 0.9288389513108615,
65
+ "eval_validation.parquet_accuracy_conf75__19": 0.8925925925925926,
66
+ "eval_validation.parquet_accuracy_conf75__2": 0.8809523809523809,
67
+ "eval_validation.parquet_accuracy_conf75__20": 0.8479087452471483,
68
+ "eval_validation.parquet_accuracy_conf75__21": 0.9502923976608187,
69
+ "eval_validation.parquet_accuracy_conf75__22": 0.9064748201438849,
70
+ "eval_validation.parquet_accuracy_conf75__23": 0.9404761904761905,
71
+ "eval_validation.parquet_accuracy_conf75__3": 0.9228070175438596,
72
+ "eval_validation.parquet_accuracy_conf75__4": 0.9067961165048544,
73
+ "eval_validation.parquet_accuracy_conf75__5": 0.9224137931034483,
74
+ "eval_validation.parquet_accuracy_conf75__6": 0.9374185136897001,
75
+ "eval_validation.parquet_accuracy_conf75__7": 0.8453333333333334,
76
+ "eval_validation.parquet_accuracy_conf75__8": 0.9276729559748428,
77
+ "eval_validation.parquet_accuracy_conf75__9": 0.9305354558610709,
78
+ "eval_validation.parquet_accuracy_label_average": 0.8470618003326092,
79
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8580792494248763,
80
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9081139825449239,
81
+ "eval_validation.parquet_accuracy_label_min": 0.7473118279569892,
82
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7711598746081505,
83
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8294573643410853,
84
+ "eval_validation.parquet_loss": 0.4807276427745819,
85
+ "eval_validation.parquet_proportion_conf50": 0.9788,
86
+ "eval_validation.parquet_proportion_conf75": 0.8551,
87
+ "eval_validation.parquet_runtime": 8.4462,
88
+ "eval_validation.parquet_samples_per_second": 1183.97,
89
+ "eval_validation.parquet_steps_per_second": 37.058,
90
+ "num_input_tokens_seen": 1949274656
91
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cfa7791b1c493e571e722b7bfcdc685ffdc50e53d25f8352743218b67888a47
3
+ size 299260928
pred_results.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_accuracy": 0.8585,
3
+ "test_accuracy__0": 0.9345794392523364,
4
+ "test_accuracy__1": 0.7317073170731707,
5
+ "test_accuracy__10": 0.9148351648351648,
6
+ "test_accuracy__11": 0.89272030651341,
7
+ "test_accuracy__12": 0.8687196110210696,
8
+ "test_accuracy__13": 0.813953488372093,
9
+ "test_accuracy__14": 0.8615819209039548,
10
+ "test_accuracy__15": 0.717948717948718,
11
+ "test_accuracy__16": 0.8550295857988166,
12
+ "test_accuracy__17": 0.8245931283905967,
13
+ "test_accuracy__18": 0.9069767441860465,
14
+ "test_accuracy__19": 0.8333333333333334,
15
+ "test_accuracy__2": 0.8660287081339713,
16
+ "test_accuracy__20": 0.8294117647058824,
17
+ "test_accuracy__21": 0.944141689373297,
18
+ "test_accuracy__22": 0.8787878787878788,
19
+ "test_accuracy__23": 0.9,
20
+ "test_accuracy__3": 0.8470254957507082,
21
+ "test_accuracy__4": 0.8442367601246106,
22
+ "test_accuracy__5": 0.8188679245283019,
23
+ "test_accuracy__6": 0.8996655518394648,
24
+ "test_accuracy__7": 0.729490022172949,
25
+ "test_accuracy__8": 0.8937329700272479,
26
+ "test_accuracy__9": 0.8665105386416861,
27
+ "test_accuracy_conf50": 0.8673937417184793,
28
+ "test_accuracy_conf50__0": 0.9433962264150944,
29
+ "test_accuracy_conf50__1": 0.7452830188679245,
30
+ "test_accuracy_conf50__10": 0.9299719887955182,
31
+ "test_accuracy_conf50__11": 0.8957528957528957,
32
+ "test_accuracy_conf50__12": 0.8768472906403941,
33
+ "test_accuracy_conf50__13": 0.8192771084337349,
34
+ "test_accuracy_conf50__14": 0.8690647482014389,
35
+ "test_accuracy_conf50__15": 0.7236842105263158,
36
+ "test_accuracy_conf50__16": 0.8640483383685801,
37
+ "test_accuracy_conf50__17": 0.8357933579335793,
38
+ "test_accuracy_conf50__18": 0.91,
39
+ "test_accuracy_conf50__19": 0.8480565371024735,
40
+ "test_accuracy_conf50__2": 0.8768472906403941,
41
+ "test_accuracy_conf50__20": 0.8433734939759037,
42
+ "test_accuracy_conf50__21": 0.9504814305364512,
43
+ "test_accuracy_conf50__22": 0.8843537414965986,
44
+ "test_accuracy_conf50__23": 0.9028213166144201,
45
+ "test_accuracy_conf50__3": 0.8571428571428571,
46
+ "test_accuracy_conf50__4": 0.8510301109350238,
47
+ "test_accuracy_conf50__5": 0.8206106870229007,
48
+ "test_accuracy_conf50__6": 0.9071347678369196,
49
+ "test_accuracy_conf50__7": 0.7441860465116279,
50
+ "test_accuracy_conf50__8": 0.9005524861878453,
51
+ "test_accuracy_conf50__9": 0.8760529482551144,
52
+ "test_accuracy_conf75": 0.917750439367311,
53
+ "test_accuracy_conf75__0": 0.95,
54
+ "test_accuracy_conf75__1": 0.8412698412698413,
55
+ "test_accuracy_conf75__10": 0.9556213017751479,
56
+ "test_accuracy_conf75__11": 0.9297520661157025,
57
+ "test_accuracy_conf75__12": 0.9298892988929889,
58
+ "test_accuracy_conf75__13": 0.8787878787878788,
59
+ "test_accuracy_conf75__14": 0.9126050420168067,
60
+ "test_accuracy_conf75__15": 0.8253012048192772,
61
+ "test_accuracy_conf75__16": 0.8885017421602788,
62
+ "test_accuracy_conf75__17": 0.8968421052631579,
63
+ "test_accuracy_conf75__18": 0.9379562043795621,
64
+ "test_accuracy_conf75__19": 0.9112903225806451,
65
+ "test_accuracy_conf75__2": 0.9028571428571428,
66
+ "test_accuracy_conf75__20": 0.896551724137931,
67
+ "test_accuracy_conf75__21": 0.9680232558139535,
68
+ "test_accuracy_conf75__22": 0.9224806201550387,
69
+ "test_accuracy_conf75__23": 0.9444444444444444,
70
+ "test_accuracy_conf75__3": 0.931899641577061,
71
+ "test_accuracy_conf75__4": 0.8976234003656307,
72
+ "test_accuracy_conf75__5": 0.9166666666666666,
73
+ "test_accuracy_conf75__6": 0.9482976040353089,
74
+ "test_accuracy_conf75__7": 0.8040345821325648,
75
+ "test_accuracy_conf75__8": 0.9447852760736196,
76
+ "test_accuracy_conf75__9": 0.9320113314447592,
77
+ "test_accuracy_label_average": 0.853078252571446,
78
+ "test_accuracy_label_average_conf50": 0.8614901207580835,
79
+ "test_accuracy_label_average_conf75": 0.9111455290735587,
80
+ "test_accuracy_label_min": 0.717948717948718,
81
+ "test_accuracy_label_min_conf50": 0.7236842105263158,
82
+ "test_accuracy_label_min_conf75": 0.8040345821325648,
83
+ "test_loss": 0.4694322645664215,
84
+ "test_proportion_conf50": 0.9811,
85
+ "test_proportion_conf75": 0.8535,
86
+ "test_runtime": 8.3953,
87
+ "test_samples_per_second": 1191.144,
88
+ "test_steps_per_second": 37.283
89
+ }
predictions.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50bfbc2c7a8ab7edc402ea54ff39daaa87fdd2faca623782df9510ef3c878c86
3
+ size 1920210
runs/May03_13-06-10_jzxh256/events.out.tfevents.1746270378.jzxh256.2086022.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a78ac1c2ecb911a5efea563326598bbb564ce04fa383ea3c8d0064403a387a1e
3
+ size 45695
runs/May03_13-06-10_jzxh256/events.out.tfevents.1746270961.jzxh256.2086022.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c32db07844e7c6338e5c5d66fafbe2c0cbb0ce1bc05825afe484411d86f1dcc
3
+ size 7120
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizer",
944
+ "unk_token": "[UNK]"
945
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "num_input_tokens_seen": 1949274656,
4
+ "train_loss": 1.6634563641670423,
5
+ "train_runtime": 573.9155,
6
+ "train_samples_per_second": 696.967,
7
+ "train_steps_per_second": 1.359
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,560 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 471,
3
+ "best_metric": 0.7473118279569892,
4
+ "best_model_checkpoint": "/linkhome/rech/genini01/udd26kf/scratch/weborganizer/models/runs/answerdotai--ModernBERT-base_TopicAnnotations-Llama-3.1-8B_bsz512_lr1e-4_epochs5_warmup0.1_url1_TopicAnnotations-Llama-3.1-405B-FP8_bsz512_lr1e-4_epochs5_warmup0.1_url1/checkpoint-471",
5
+ "epoch": 4.9728,
6
+ "eval_steps": 500,
7
+ "global_step": 780,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.64,
14
+ "grad_norm": 8.25,
15
+ "learning_rate": 9.686609686609687e-05,
16
+ "loss": 2.1544,
17
+ "num_input_tokens_seen": 249204064,
18
+ "step": 100
19
+ },
20
+ {
21
+ "epoch": 1.0,
22
+ "eval_validation.parquet_accuracy": 0.8451,
23
+ "eval_validation.parquet_accuracy__0": 0.9019607843137255,
24
+ "eval_validation.parquet_accuracy__1": 0.7975460122699386,
25
+ "eval_validation.parquet_accuracy__10": 0.9105882352941177,
26
+ "eval_validation.parquet_accuracy__11": 0.8687258687258688,
27
+ "eval_validation.parquet_accuracy__12": 0.8734622144112478,
28
+ "eval_validation.parquet_accuracy__13": 0.6827956989247311,
29
+ "eval_validation.parquet_accuracy__14": 0.8229461756373938,
30
+ "eval_validation.parquet_accuracy__15": 0.8427947598253275,
31
+ "eval_validation.parquet_accuracy__16": 0.8194842406876791,
32
+ "eval_validation.parquet_accuracy__17": 0.8260135135135135,
33
+ "eval_validation.parquet_accuracy__18": 0.8732876712328768,
34
+ "eval_validation.parquet_accuracy__19": 0.861764705882353,
35
+ "eval_validation.parquet_accuracy__2": 0.8159203980099502,
36
+ "eval_validation.parquet_accuracy__20": 0.7183098591549296,
37
+ "eval_validation.parquet_accuracy__21": 0.8975409836065574,
38
+ "eval_validation.parquet_accuracy__22": 0.7981651376146789,
39
+ "eval_validation.parquet_accuracy__23": 0.863481228668942,
40
+ "eval_validation.parquet_accuracy__3": 0.9017857142857143,
41
+ "eval_validation.parquet_accuracy__4": 0.8697829716193656,
42
+ "eval_validation.parquet_accuracy__5": 0.8246268656716418,
43
+ "eval_validation.parquet_accuracy__6": 0.8907168037602821,
44
+ "eval_validation.parquet_accuracy__7": 0.7784946236559139,
45
+ "eval_validation.parquet_accuracy__8": 0.8932584269662921,
46
+ "eval_validation.parquet_accuracy__9": 0.8230403800475059,
47
+ "eval_validation.parquet_accuracy_conf50": 0.8559460563955864,
48
+ "eval_validation.parquet_accuracy_conf50__0": 0.9108910891089109,
49
+ "eval_validation.parquet_accuracy_conf50__1": 0.8087774294670846,
50
+ "eval_validation.parquet_accuracy_conf50__10": 0.9148936170212766,
51
+ "eval_validation.parquet_accuracy_conf50__11": 0.8745098039215686,
52
+ "eval_validation.parquet_accuracy_conf50__12": 0.8857142857142857,
53
+ "eval_validation.parquet_accuracy_conf50__13": 0.7085714285714285,
54
+ "eval_validation.parquet_accuracy_conf50__14": 0.8338150289017341,
55
+ "eval_validation.parquet_accuracy_conf50__15": 0.8609865470852018,
56
+ "eval_validation.parquet_accuracy_conf50__16": 0.8240469208211144,
57
+ "eval_validation.parquet_accuracy_conf50__17": 0.8319039451114922,
58
+ "eval_validation.parquet_accuracy_conf50__18": 0.8788927335640139,
59
+ "eval_validation.parquet_accuracy_conf50__19": 0.8761329305135952,
60
+ "eval_validation.parquet_accuracy_conf50__2": 0.8350515463917526,
61
+ "eval_validation.parquet_accuracy_conf50__20": 0.7323529411764705,
62
+ "eval_validation.parquet_accuracy_conf50__21": 0.9009628610729024,
63
+ "eval_validation.parquet_accuracy_conf50__22": 0.8113207547169812,
64
+ "eval_validation.parquet_accuracy_conf50__23": 0.865979381443299,
65
+ "eval_validation.parquet_accuracy_conf50__3": 0.9195046439628483,
66
+ "eval_validation.parquet_accuracy_conf50__4": 0.8807495741056218,
67
+ "eval_validation.parquet_accuracy_conf50__5": 0.8358778625954199,
68
+ "eval_validation.parquet_accuracy_conf50__6": 0.9020310633213859,
69
+ "eval_validation.parquet_accuracy_conf50__7": 0.7986425339366516,
70
+ "eval_validation.parquet_accuracy_conf50__8": 0.8951841359773371,
71
+ "eval_validation.parquet_accuracy_conf50__9": 0.8345498783454988,
72
+ "eval_validation.parquet_accuracy_conf75": 0.9065606361829026,
73
+ "eval_validation.parquet_accuracy_conf75__0": 0.967391304347826,
74
+ "eval_validation.parquet_accuracy_conf75__1": 0.8828125,
75
+ "eval_validation.parquet_accuracy_conf75__10": 0.9493670886075949,
76
+ "eval_validation.parquet_accuracy_conf75__11": 0.9037656903765691,
77
+ "eval_validation.parquet_accuracy_conf75__12": 0.9176470588235294,
78
+ "eval_validation.parquet_accuracy_conf75__13": 0.7906976744186046,
79
+ "eval_validation.parquet_accuracy_conf75__14": 0.9027303754266212,
80
+ "eval_validation.parquet_accuracy_conf75__15": 0.9281767955801105,
81
+ "eval_validation.parquet_accuracy_conf75__16": 0.886986301369863,
82
+ "eval_validation.parquet_accuracy_conf75__17": 0.8814229249011858,
83
+ "eval_validation.parquet_accuracy_conf75__18": 0.9176029962546817,
84
+ "eval_validation.parquet_accuracy_conf75__19": 0.9185185185185185,
85
+ "eval_validation.parquet_accuracy_conf75__2": 0.9047619047619048,
86
+ "eval_validation.parquet_accuracy_conf75__20": 0.8022813688212928,
87
+ "eval_validation.parquet_accuracy_conf75__21": 0.9327485380116959,
88
+ "eval_validation.parquet_accuracy_conf75__22": 0.8741007194244604,
89
+ "eval_validation.parquet_accuracy_conf75__23": 0.9246031746031746,
90
+ "eval_validation.parquet_accuracy_conf75__3": 0.9473684210526315,
91
+ "eval_validation.parquet_accuracy_conf75__4": 0.9242718446601942,
92
+ "eval_validation.parquet_accuracy_conf75__5": 0.8879310344827587,
93
+ "eval_validation.parquet_accuracy_conf75__6": 0.9322033898305084,
94
+ "eval_validation.parquet_accuracy_conf75__7": 0.848,
95
+ "eval_validation.parquet_accuracy_conf75__8": 0.9339622641509434,
96
+ "eval_validation.parquet_accuracy_conf75__9": 0.9001447178002895,
97
+ "eval_validation.parquet_accuracy_label_average": 0.8398538864075228,
98
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8508892890353281,
99
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9024790252593734,
100
+ "eval_validation.parquet_accuracy_label_min": 0.6827956989247311,
101
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7085714285714285,
102
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7906976744186046,
103
+ "eval_validation.parquet_loss": 0.5004527568817139,
104
+ "eval_validation.parquet_proportion_conf50": 0.9788,
105
+ "eval_validation.parquet_proportion_conf75": 0.8551,
106
+ "eval_validation.parquet_runtime": 10.52,
107
+ "eval_validation.parquet_samples_per_second": 950.571,
108
+ "eval_validation.parquet_steps_per_second": 29.753,
109
+ "num_input_tokens_seen": 390215936,
110
+ "step": 157
111
+ },
112
+ {
113
+ "epoch": 1.2752,
114
+ "grad_norm": 9.875,
115
+ "learning_rate": 8.262108262108262e-05,
116
+ "loss": 1.8475,
117
+ "num_input_tokens_seen": 499147424,
118
+ "step": 200
119
+ },
120
+ {
121
+ "epoch": 1.9152,
122
+ "grad_norm": 7.53125,
123
+ "learning_rate": 6.837606837606838e-05,
124
+ "loss": 1.7317,
125
+ "num_input_tokens_seen": 751160992,
126
+ "step": 300
127
+ },
128
+ {
129
+ "epoch": 2.0,
130
+ "eval_validation.parquet_accuracy": 0.8526,
131
+ "eval_validation.parquet_accuracy__0": 0.8725490196078431,
132
+ "eval_validation.parquet_accuracy__1": 0.8128834355828221,
133
+ "eval_validation.parquet_accuracy__10": 0.9176470588235294,
134
+ "eval_validation.parquet_accuracy__11": 0.9073359073359073,
135
+ "eval_validation.parquet_accuracy__12": 0.9138840070298769,
136
+ "eval_validation.parquet_accuracy__13": 0.7419354838709677,
137
+ "eval_validation.parquet_accuracy__14": 0.7818696883852692,
138
+ "eval_validation.parquet_accuracy__15": 0.8427947598253275,
139
+ "eval_validation.parquet_accuracy__16": 0.8481375358166189,
140
+ "eval_validation.parquet_accuracy__17": 0.8733108108108109,
141
+ "eval_validation.parquet_accuracy__18": 0.8732876712328768,
142
+ "eval_validation.parquet_accuracy__19": 0.8205882352941176,
143
+ "eval_validation.parquet_accuracy__2": 0.7860696517412935,
144
+ "eval_validation.parquet_accuracy__20": 0.7830985915492957,
145
+ "eval_validation.parquet_accuracy__21": 0.9344262295081968,
146
+ "eval_validation.parquet_accuracy__22": 0.8562691131498471,
147
+ "eval_validation.parquet_accuracy__23": 0.9078498293515358,
148
+ "eval_validation.parquet_accuracy__3": 0.8541666666666666,
149
+ "eval_validation.parquet_accuracy__4": 0.8414023372287145,
150
+ "eval_validation.parquet_accuracy__5": 0.8208955223880597,
151
+ "eval_validation.parquet_accuracy__6": 0.8883666274970623,
152
+ "eval_validation.parquet_accuracy__7": 0.7784946236559139,
153
+ "eval_validation.parquet_accuracy__8": 0.8960674157303371,
154
+ "eval_validation.parquet_accuracy__9": 0.8111638954869359,
155
+ "eval_validation.parquet_accuracy_conf50": 0.8627911728647323,
156
+ "eval_validation.parquet_accuracy_conf50__0": 0.8811881188118812,
157
+ "eval_validation.parquet_accuracy_conf50__1": 0.8244514106583072,
158
+ "eval_validation.parquet_accuracy_conf50__10": 0.9219858156028369,
159
+ "eval_validation.parquet_accuracy_conf50__11": 0.9137254901960784,
160
+ "eval_validation.parquet_accuracy_conf50__12": 0.9214285714285714,
161
+ "eval_validation.parquet_accuracy_conf50__13": 0.7771428571428571,
162
+ "eval_validation.parquet_accuracy_conf50__14": 0.7947976878612717,
163
+ "eval_validation.parquet_accuracy_conf50__15": 0.8565022421524664,
164
+ "eval_validation.parquet_accuracy_conf50__16": 0.8533724340175953,
165
+ "eval_validation.parquet_accuracy_conf50__17": 0.8782161234991424,
166
+ "eval_validation.parquet_accuracy_conf50__18": 0.8788927335640139,
167
+ "eval_validation.parquet_accuracy_conf50__19": 0.8368580060422961,
168
+ "eval_validation.parquet_accuracy_conf50__2": 0.8041237113402062,
169
+ "eval_validation.parquet_accuracy_conf50__20": 0.8,
170
+ "eval_validation.parquet_accuracy_conf50__21": 0.936726272352132,
171
+ "eval_validation.parquet_accuracy_conf50__22": 0.8679245283018868,
172
+ "eval_validation.parquet_accuracy_conf50__23": 0.9072164948453608,
173
+ "eval_validation.parquet_accuracy_conf50__3": 0.8761609907120743,
174
+ "eval_validation.parquet_accuracy_conf50__4": 0.8534923339011925,
175
+ "eval_validation.parquet_accuracy_conf50__5": 0.8320610687022901,
176
+ "eval_validation.parquet_accuracy_conf50__6": 0.8984468339307049,
177
+ "eval_validation.parquet_accuracy_conf50__7": 0.7873303167420814,
178
+ "eval_validation.parquet_accuracy_conf50__8": 0.8980169971671388,
179
+ "eval_validation.parquet_accuracy_conf50__9": 0.8211678832116789,
180
+ "eval_validation.parquet_accuracy_conf75": 0.9124079055081277,
181
+ "eval_validation.parquet_accuracy_conf75__0": 0.9347826086956522,
182
+ "eval_validation.parquet_accuracy_conf75__1": 0.8984375,
183
+ "eval_validation.parquet_accuracy_conf75__10": 0.9620253164556962,
184
+ "eval_validation.parquet_accuracy_conf75__11": 0.9372384937238494,
185
+ "eval_validation.parquet_accuracy_conf75__12": 0.9529411764705882,
186
+ "eval_validation.parquet_accuracy_conf75__13": 0.8294573643410853,
187
+ "eval_validation.parquet_accuracy_conf75__14": 0.8686006825938567,
188
+ "eval_validation.parquet_accuracy_conf75__15": 0.9226519337016574,
189
+ "eval_validation.parquet_accuracy_conf75__16": 0.9143835616438356,
190
+ "eval_validation.parquet_accuracy_conf75__17": 0.9209486166007905,
191
+ "eval_validation.parquet_accuracy_conf75__18": 0.9250936329588015,
192
+ "eval_validation.parquet_accuracy_conf75__19": 0.8962962962962963,
193
+ "eval_validation.parquet_accuracy_conf75__2": 0.8809523809523809,
194
+ "eval_validation.parquet_accuracy_conf75__20": 0.8593155893536122,
195
+ "eval_validation.parquet_accuracy_conf75__21": 0.9576023391812866,
196
+ "eval_validation.parquet_accuracy_conf75__22": 0.8992805755395683,
197
+ "eval_validation.parquet_accuracy_conf75__23": 0.9603174603174603,
198
+ "eval_validation.parquet_accuracy_conf75__3": 0.9192982456140351,
199
+ "eval_validation.parquet_accuracy_conf75__4": 0.8990291262135922,
200
+ "eval_validation.parquet_accuracy_conf75__5": 0.8879310344827587,
201
+ "eval_validation.parquet_accuracy_conf75__6": 0.9282920469361148,
202
+ "eval_validation.parquet_accuracy_conf75__7": 0.84,
203
+ "eval_validation.parquet_accuracy_conf75__8": 0.9339622641509434,
204
+ "eval_validation.parquet_accuracy_conf75__9": 0.8900144717800289,
205
+ "eval_validation.parquet_accuracy_label_average": 0.8485205882320762,
206
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8592178717576693,
207
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.909118863250162,
208
+ "eval_validation.parquet_accuracy_label_min": 0.7419354838709677,
209
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7771428571428571,
210
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8294573643410853,
211
+ "eval_validation.parquet_loss": 0.4816047251224518,
212
+ "eval_validation.parquet_proportion_conf50": 0.9788,
213
+ "eval_validation.parquet_proportion_conf75": 0.8551,
214
+ "eval_validation.parquet_runtime": 8.307,
215
+ "eval_validation.parquet_samples_per_second": 1203.799,
216
+ "eval_validation.parquet_steps_per_second": 37.679,
217
+ "num_input_tokens_seen": 783399104,
218
+ "step": 314
219
+ },
220
+ {
221
+ "epoch": 2.5504,
222
+ "grad_norm": 7.59375,
223
+ "learning_rate": 5.413105413105414e-05,
224
+ "loss": 1.5837,
225
+ "num_input_tokens_seen": 999700736,
226
+ "step": 400
227
+ },
228
+ {
229
+ "epoch": 3.0,
230
+ "eval_validation.parquet_accuracy": 0.8558,
231
+ "eval_validation.parquet_accuracy__0": 0.8627450980392157,
232
+ "eval_validation.parquet_accuracy__1": 0.7607361963190185,
233
+ "eval_validation.parquet_accuracy__10": 0.9035294117647059,
234
+ "eval_validation.parquet_accuracy__11": 0.8764478764478765,
235
+ "eval_validation.parquet_accuracy__12": 0.8980667838312829,
236
+ "eval_validation.parquet_accuracy__13": 0.7473118279569892,
237
+ "eval_validation.parquet_accuracy__14": 0.839943342776204,
238
+ "eval_validation.parquet_accuracy__15": 0.8427947598253275,
239
+ "eval_validation.parquet_accuracy__16": 0.830945558739255,
240
+ "eval_validation.parquet_accuracy__17": 0.839527027027027,
241
+ "eval_validation.parquet_accuracy__18": 0.8801369863013698,
242
+ "eval_validation.parquet_accuracy__19": 0.8117647058823529,
243
+ "eval_validation.parquet_accuracy__2": 0.7860696517412935,
244
+ "eval_validation.parquet_accuracy__20": 0.7633802816901408,
245
+ "eval_validation.parquet_accuracy__21": 0.9289617486338798,
246
+ "eval_validation.parquet_accuracy__22": 0.8562691131498471,
247
+ "eval_validation.parquet_accuracy__23": 0.8805460750853242,
248
+ "eval_validation.parquet_accuracy__3": 0.8660714285714286,
249
+ "eval_validation.parquet_accuracy__4": 0.8530884808013356,
250
+ "eval_validation.parquet_accuracy__5": 0.8694029850746269,
251
+ "eval_validation.parquet_accuracy__6": 0.900117508813161,
252
+ "eval_validation.parquet_accuracy__7": 0.7741935483870968,
253
+ "eval_validation.parquet_accuracy__8": 0.8904494382022472,
254
+ "eval_validation.parquet_accuracy__9": 0.8669833729216152,
255
+ "eval_validation.parquet_accuracy_conf50": 0.8663669799754802,
256
+ "eval_validation.parquet_accuracy_conf50__0": 0.8712871287128713,
257
+ "eval_validation.parquet_accuracy_conf50__1": 0.7711598746081505,
258
+ "eval_validation.parquet_accuracy_conf50__10": 0.9078014184397163,
259
+ "eval_validation.parquet_accuracy_conf50__11": 0.8823529411764706,
260
+ "eval_validation.parquet_accuracy_conf50__12": 0.9089285714285714,
261
+ "eval_validation.parquet_accuracy_conf50__13": 0.7828571428571428,
262
+ "eval_validation.parquet_accuracy_conf50__14": 0.8511560693641619,
263
+ "eval_validation.parquet_accuracy_conf50__15": 0.8565022421524664,
264
+ "eval_validation.parquet_accuracy_conf50__16": 0.8357771260997068,
265
+ "eval_validation.parquet_accuracy_conf50__17": 0.8456260720411664,
266
+ "eval_validation.parquet_accuracy_conf50__18": 0.8858131487889274,
267
+ "eval_validation.parquet_accuracy_conf50__19": 0.8277945619335347,
268
+ "eval_validation.parquet_accuracy_conf50__2": 0.8041237113402062,
269
+ "eval_validation.parquet_accuracy_conf50__20": 0.7794117647058824,
270
+ "eval_validation.parquet_accuracy_conf50__21": 0.9312242090784044,
271
+ "eval_validation.parquet_accuracy_conf50__22": 0.8710691823899371,
272
+ "eval_validation.parquet_accuracy_conf50__23": 0.8797250859106529,
273
+ "eval_validation.parquet_accuracy_conf50__3": 0.8885448916408669,
274
+ "eval_validation.parquet_accuracy_conf50__4": 0.8637137989778535,
275
+ "eval_validation.parquet_accuracy_conf50__5": 0.8816793893129771,
276
+ "eval_validation.parquet_accuracy_conf50__6": 0.9115890083632019,
277
+ "eval_validation.parquet_accuracy_conf50__7": 0.7850678733031674,
278
+ "eval_validation.parquet_accuracy_conf50__8": 0.8923512747875354,
279
+ "eval_validation.parquet_accuracy_conf50__9": 0.878345498783455,
280
+ "eval_validation.parquet_accuracy_conf75": 0.9145129224652088,
281
+ "eval_validation.parquet_accuracy_conf75__0": 0.9239130434782609,
282
+ "eval_validation.parquet_accuracy_conf75__1": 0.85546875,
283
+ "eval_validation.parquet_accuracy_conf75__10": 0.9493670886075949,
284
+ "eval_validation.parquet_accuracy_conf75__11": 0.9121338912133892,
285
+ "eval_validation.parquet_accuracy_conf75__12": 0.9450980392156862,
286
+ "eval_validation.parquet_accuracy_conf75__13": 0.8294573643410853,
287
+ "eval_validation.parquet_accuracy_conf75__14": 0.9129692832764505,
288
+ "eval_validation.parquet_accuracy_conf75__15": 0.9392265193370166,
289
+ "eval_validation.parquet_accuracy_conf75__16": 0.8972602739726028,
290
+ "eval_validation.parquet_accuracy_conf75__17": 0.8893280632411067,
291
+ "eval_validation.parquet_accuracy_conf75__18": 0.9288389513108615,
292
+ "eval_validation.parquet_accuracy_conf75__19": 0.8925925925925926,
293
+ "eval_validation.parquet_accuracy_conf75__2": 0.8809523809523809,
294
+ "eval_validation.parquet_accuracy_conf75__20": 0.8479087452471483,
295
+ "eval_validation.parquet_accuracy_conf75__21": 0.9502923976608187,
296
+ "eval_validation.parquet_accuracy_conf75__22": 0.9064748201438849,
297
+ "eval_validation.parquet_accuracy_conf75__23": 0.9404761904761905,
298
+ "eval_validation.parquet_accuracy_conf75__3": 0.9228070175438596,
299
+ "eval_validation.parquet_accuracy_conf75__4": 0.9067961165048544,
300
+ "eval_validation.parquet_accuracy_conf75__5": 0.9224137931034483,
301
+ "eval_validation.parquet_accuracy_conf75__6": 0.9374185136897001,
302
+ "eval_validation.parquet_accuracy_conf75__7": 0.8453333333333334,
303
+ "eval_validation.parquet_accuracy_conf75__8": 0.9276729559748428,
304
+ "eval_validation.parquet_accuracy_conf75__9": 0.9305354558610709,
305
+ "eval_validation.parquet_accuracy_label_average": 0.8470618003326092,
306
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8580792494248763,
307
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9081139825449239,
308
+ "eval_validation.parquet_accuracy_label_min": 0.7473118279569892,
309
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7711598746081505,
310
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8294573643410853,
311
+ "eval_validation.parquet_loss": 0.4807276427745819,
312
+ "eval_validation.parquet_proportion_conf50": 0.9788,
313
+ "eval_validation.parquet_proportion_conf75": 0.8551,
314
+ "eval_validation.parquet_runtime": 8.2886,
315
+ "eval_validation.parquet_samples_per_second": 1206.475,
316
+ "eval_validation.parquet_steps_per_second": 37.763,
317
+ "num_input_tokens_seen": 1176307328,
318
+ "step": 471
319
+ },
320
+ {
321
+ "epoch": 3.1856,
322
+ "grad_norm": 6.53125,
323
+ "learning_rate": 3.988603988603989e-05,
324
+ "loss": 1.5392,
325
+ "num_input_tokens_seen": 1250925472,
326
+ "step": 500
327
+ },
328
+ {
329
+ "epoch": 3.8256,
330
+ "grad_norm": 7.0625,
331
+ "learning_rate": 2.564102564102564e-05,
332
+ "loss": 1.4928,
333
+ "num_input_tokens_seen": 1499507040,
334
+ "step": 600
335
+ },
336
+ {
337
+ "epoch": 4.0,
338
+ "eval_validation.parquet_accuracy": 0.8567,
339
+ "eval_validation.parquet_accuracy__0": 0.8725490196078431,
340
+ "eval_validation.parquet_accuracy__1": 0.8006134969325154,
341
+ "eval_validation.parquet_accuracy__10": 0.9105882352941177,
342
+ "eval_validation.parquet_accuracy__11": 0.888030888030888,
343
+ "eval_validation.parquet_accuracy__12": 0.9086115992970123,
344
+ "eval_validation.parquet_accuracy__13": 0.7419354838709677,
345
+ "eval_validation.parquet_accuracy__14": 0.8271954674220963,
346
+ "eval_validation.parquet_accuracy__15": 0.851528384279476,
347
+ "eval_validation.parquet_accuracy__16": 0.8510028653295129,
348
+ "eval_validation.parquet_accuracy__17": 0.8817567567567568,
349
+ "eval_validation.parquet_accuracy__18": 0.8664383561643836,
350
+ "eval_validation.parquet_accuracy__19": 0.8088235294117647,
351
+ "eval_validation.parquet_accuracy__2": 0.8059701492537313,
352
+ "eval_validation.parquet_accuracy__20": 0.7492957746478873,
353
+ "eval_validation.parquet_accuracy__21": 0.924863387978142,
354
+ "eval_validation.parquet_accuracy__22": 0.8379204892966361,
355
+ "eval_validation.parquet_accuracy__23": 0.863481228668942,
356
+ "eval_validation.parquet_accuracy__3": 0.8779761904761905,
357
+ "eval_validation.parquet_accuracy__4": 0.8464106844741235,
358
+ "eval_validation.parquet_accuracy__5": 0.8731343283582089,
359
+ "eval_validation.parquet_accuracy__6": 0.881316098707403,
360
+ "eval_validation.parquet_accuracy__7": 0.810752688172043,
361
+ "eval_validation.parquet_accuracy__8": 0.8904494382022472,
362
+ "eval_validation.parquet_accuracy__9": 0.8396674584323041,
363
+ "eval_validation.parquet_accuracy_conf50": 0.8674908050674295,
364
+ "eval_validation.parquet_accuracy_conf50__0": 0.8811881188118812,
365
+ "eval_validation.parquet_accuracy_conf50__1": 0.8119122257053292,
366
+ "eval_validation.parquet_accuracy_conf50__10": 0.9148936170212766,
367
+ "eval_validation.parquet_accuracy_conf50__11": 0.8980392156862745,
368
+ "eval_validation.parquet_accuracy_conf50__12": 0.9178571428571428,
369
+ "eval_validation.parquet_accuracy_conf50__13": 0.7771428571428571,
370
+ "eval_validation.parquet_accuracy_conf50__14": 0.838150289017341,
371
+ "eval_validation.parquet_accuracy_conf50__15": 0.8654708520179372,
372
+ "eval_validation.parquet_accuracy_conf50__16": 0.8563049853372434,
373
+ "eval_validation.parquet_accuracy_conf50__17": 0.8867924528301887,
374
+ "eval_validation.parquet_accuracy_conf50__18": 0.8719723183391004,
375
+ "eval_validation.parquet_accuracy_conf50__19": 0.824773413897281,
376
+ "eval_validation.parquet_accuracy_conf50__2": 0.8247422680412371,
377
+ "eval_validation.parquet_accuracy_conf50__20": 0.7676470588235295,
378
+ "eval_validation.parquet_accuracy_conf50__21": 0.9270976616231087,
379
+ "eval_validation.parquet_accuracy_conf50__22": 0.8522012578616353,
380
+ "eval_validation.parquet_accuracy_conf50__23": 0.8625429553264605,
381
+ "eval_validation.parquet_accuracy_conf50__3": 0.9009287925696594,
382
+ "eval_validation.parquet_accuracy_conf50__4": 0.858603066439523,
383
+ "eval_validation.parquet_accuracy_conf50__5": 0.8854961832061069,
384
+ "eval_validation.parquet_accuracy_conf50__6": 0.8936678614097969,
385
+ "eval_validation.parquet_accuracy_conf50__7": 0.8235294117647058,
386
+ "eval_validation.parquet_accuracy_conf50__8": 0.8923512747875354,
387
+ "eval_validation.parquet_accuracy_conf50__9": 0.8503649635036497,
388
+ "eval_validation.parquet_accuracy_conf75": 0.9156823763302537,
389
+ "eval_validation.parquet_accuracy_conf75__0": 0.9347826086956522,
390
+ "eval_validation.parquet_accuracy_conf75__1": 0.88671875,
391
+ "eval_validation.parquet_accuracy_conf75__10": 0.9544303797468354,
392
+ "eval_validation.parquet_accuracy_conf75__11": 0.9288702928870293,
393
+ "eval_validation.parquet_accuracy_conf75__12": 0.9509803921568627,
394
+ "eval_validation.parquet_accuracy_conf75__13": 0.8217054263565892,
395
+ "eval_validation.parquet_accuracy_conf75__14": 0.9027303754266212,
396
+ "eval_validation.parquet_accuracy_conf75__15": 0.9392265193370166,
397
+ "eval_validation.parquet_accuracy_conf75__16": 0.910958904109589,
398
+ "eval_validation.parquet_accuracy_conf75__17": 0.9308300395256917,
399
+ "eval_validation.parquet_accuracy_conf75__18": 0.9138576779026217,
400
+ "eval_validation.parquet_accuracy_conf75__19": 0.8851851851851852,
401
+ "eval_validation.parquet_accuracy_conf75__2": 0.9047619047619048,
402
+ "eval_validation.parquet_accuracy_conf75__20": 0.8365019011406845,
403
+ "eval_validation.parquet_accuracy_conf75__21": 0.9488304093567251,
404
+ "eval_validation.parquet_accuracy_conf75__22": 0.89568345323741,
405
+ "eval_validation.parquet_accuracy_conf75__23": 0.9365079365079365,
406
+ "eval_validation.parquet_accuracy_conf75__3": 0.9333333333333333,
407
+ "eval_validation.parquet_accuracy_conf75__4": 0.8932038834951457,
408
+ "eval_validation.parquet_accuracy_conf75__5": 0.9224137931034483,
409
+ "eval_validation.parquet_accuracy_conf75__6": 0.9230769230769231,
410
+ "eval_validation.parquet_accuracy_conf75__7": 0.8746666666666667,
411
+ "eval_validation.parquet_accuracy_conf75__8": 0.9339622641509434,
412
+ "eval_validation.parquet_accuracy_conf75__9": 0.91027496382055,
413
+ "eval_validation.parquet_accuracy_label_average": 0.8504296666277162,
414
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8618195935008668,
415
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9113955826658905,
416
+ "eval_validation.parquet_accuracy_label_min": 0.7419354838709677,
417
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7676470588235295,
418
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8217054263565892,
419
+ "eval_validation.parquet_loss": 0.47853514552116394,
420
+ "eval_validation.parquet_proportion_conf50": 0.9788,
421
+ "eval_validation.parquet_proportion_conf75": 0.8551,
422
+ "eval_validation.parquet_runtime": 8.3896,
423
+ "eval_validation.parquet_samples_per_second": 1191.949,
424
+ "eval_validation.parquet_steps_per_second": 37.308,
425
+ "num_input_tokens_seen": 1566401088,
426
+ "step": 628
427
+ },
428
+ {
429
+ "epoch": 4.4608,
430
+ "grad_norm": 7.5625,
431
+ "learning_rate": 1.1396011396011397e-05,
432
+ "loss": 1.4653,
433
+ "num_input_tokens_seen": 1745927840,
434
+ "step": 700
435
+ },
436
+ {
437
+ "epoch": 4.9728,
438
+ "eval_validation.parquet_accuracy": 0.8571,
439
+ "eval_validation.parquet_accuracy__0": 0.8725490196078431,
440
+ "eval_validation.parquet_accuracy__1": 0.7914110429447853,
441
+ "eval_validation.parquet_accuracy__10": 0.9105882352941177,
442
+ "eval_validation.parquet_accuracy__11": 0.8918918918918919,
443
+ "eval_validation.parquet_accuracy__12": 0.9033391915641477,
444
+ "eval_validation.parquet_accuracy__13": 0.7419354838709677,
445
+ "eval_validation.parquet_accuracy__14": 0.8314447592067988,
446
+ "eval_validation.parquet_accuracy__15": 0.8558951965065502,
447
+ "eval_validation.parquet_accuracy__16": 0.8481375358166189,
448
+ "eval_validation.parquet_accuracy__17": 0.875,
449
+ "eval_validation.parquet_accuracy__18": 0.8595890410958904,
450
+ "eval_validation.parquet_accuracy__19": 0.8117647058823529,
451
+ "eval_validation.parquet_accuracy__2": 0.8109452736318408,
452
+ "eval_validation.parquet_accuracy__20": 0.7436619718309859,
453
+ "eval_validation.parquet_accuracy__21": 0.9262295081967213,
454
+ "eval_validation.parquet_accuracy__22": 0.8440366972477065,
455
+ "eval_validation.parquet_accuracy__23": 0.863481228668942,
456
+ "eval_validation.parquet_accuracy__3": 0.8809523809523809,
457
+ "eval_validation.parquet_accuracy__4": 0.8497495826377296,
458
+ "eval_validation.parquet_accuracy__5": 0.8731343283582089,
459
+ "eval_validation.parquet_accuracy__6": 0.8883666274970623,
460
+ "eval_validation.parquet_accuracy__7": 0.7956989247311828,
461
+ "eval_validation.parquet_accuracy__8": 0.9044943820224719,
462
+ "eval_validation.parquet_accuracy__9": 0.8420427553444181,
463
+ "eval_validation.parquet_accuracy_conf50": 0.8678994687372292,
464
+ "eval_validation.parquet_accuracy_conf50__0": 0.8811881188118812,
465
+ "eval_validation.parquet_accuracy_conf50__1": 0.8025078369905956,
466
+ "eval_validation.parquet_accuracy_conf50__10": 0.9148936170212766,
467
+ "eval_validation.parquet_accuracy_conf50__11": 0.8980392156862745,
468
+ "eval_validation.parquet_accuracy_conf50__12": 0.9125,
469
+ "eval_validation.parquet_accuracy_conf50__13": 0.7771428571428571,
470
+ "eval_validation.parquet_accuracy_conf50__14": 0.8424855491329479,
471
+ "eval_validation.parquet_accuracy_conf50__15": 0.8699551569506726,
472
+ "eval_validation.parquet_accuracy_conf50__16": 0.8533724340175953,
473
+ "eval_validation.parquet_accuracy_conf50__17": 0.8799313893653516,
474
+ "eval_validation.parquet_accuracy_conf50__18": 0.8650519031141869,
475
+ "eval_validation.parquet_accuracy_conf50__19": 0.8277945619335347,
476
+ "eval_validation.parquet_accuracy_conf50__2": 0.8298969072164949,
477
+ "eval_validation.parquet_accuracy_conf50__20": 0.7647058823529411,
478
+ "eval_validation.parquet_accuracy_conf50__21": 0.9284731774415406,
479
+ "eval_validation.parquet_accuracy_conf50__22": 0.8584905660377359,
480
+ "eval_validation.parquet_accuracy_conf50__23": 0.8625429553264605,
481
+ "eval_validation.parquet_accuracy_conf50__3": 0.9040247678018576,
482
+ "eval_validation.parquet_accuracy_conf50__4": 0.8620102214650767,
483
+ "eval_validation.parquet_accuracy_conf50__5": 0.8854961832061069,
484
+ "eval_validation.parquet_accuracy_conf50__6": 0.9008363201911589,
485
+ "eval_validation.parquet_accuracy_conf50__7": 0.8076923076923077,
486
+ "eval_validation.parquet_accuracy_conf50__8": 0.9065155807365439,
487
+ "eval_validation.parquet_accuracy_conf50__9": 0.8527980535279805,
488
+ "eval_validation.parquet_accuracy_conf75": 0.9163840486492808,
489
+ "eval_validation.parquet_accuracy_conf75__0": 0.9347826086956522,
490
+ "eval_validation.parquet_accuracy_conf75__1": 0.87890625,
491
+ "eval_validation.parquet_accuracy_conf75__10": 0.9518987341772152,
492
+ "eval_validation.parquet_accuracy_conf75__11": 0.9288702928870293,
493
+ "eval_validation.parquet_accuracy_conf75__12": 0.9470588235294117,
494
+ "eval_validation.parquet_accuracy_conf75__13": 0.8217054263565892,
495
+ "eval_validation.parquet_accuracy_conf75__14": 0.9061433447098977,
496
+ "eval_validation.parquet_accuracy_conf75__15": 0.9447513812154696,
497
+ "eval_validation.parquet_accuracy_conf75__16": 0.910958904109589,
498
+ "eval_validation.parquet_accuracy_conf75__17": 0.924901185770751,
499
+ "eval_validation.parquet_accuracy_conf75__18": 0.9101123595505618,
500
+ "eval_validation.parquet_accuracy_conf75__19": 0.8888888888888888,
501
+ "eval_validation.parquet_accuracy_conf75__2": 0.9047619047619048,
502
+ "eval_validation.parquet_accuracy_conf75__20": 0.8326996197718631,
503
+ "eval_validation.parquet_accuracy_conf75__21": 0.9502923976608187,
504
+ "eval_validation.parquet_accuracy_conf75__22": 0.9028776978417267,
505
+ "eval_validation.parquet_accuracy_conf75__23": 0.9325396825396826,
506
+ "eval_validation.parquet_accuracy_conf75__3": 0.9368421052631579,
507
+ "eval_validation.parquet_accuracy_conf75__4": 0.9009708737864077,
508
+ "eval_validation.parquet_accuracy_conf75__5": 0.9224137931034483,
509
+ "eval_validation.parquet_accuracy_conf75__6": 0.9308996088657105,
510
+ "eval_validation.parquet_accuracy_conf75__7": 0.864,
511
+ "eval_validation.parquet_accuracy_conf75__8": 0.940251572327044,
512
+ "eval_validation.parquet_accuracy_conf75__9": 0.9117221418234442,
513
+ "eval_validation.parquet_accuracy_label_average": 0.8506808235334006,
514
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8620143984651407,
515
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.9116353999015111,
516
+ "eval_validation.parquet_accuracy_label_min": 0.7419354838709677,
517
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.7647058823529411,
518
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.8217054263565892,
519
+ "eval_validation.parquet_loss": 0.47900858521461487,
520
+ "eval_validation.parquet_proportion_conf50": 0.9788,
521
+ "eval_validation.parquet_proportion_conf75": 0.8551,
522
+ "eval_validation.parquet_runtime": 8.446,
523
+ "eval_validation.parquet_samples_per_second": 1183.995,
524
+ "eval_validation.parquet_steps_per_second": 37.059,
525
+ "num_input_tokens_seen": 1949274656,
526
+ "step": 780
527
+ },
528
+ {
529
+ "epoch": 4.9728,
530
+ "num_input_tokens_seen": 1949274656,
531
+ "step": 780,
532
+ "total_flos": 1.297523316772307e+18,
533
+ "train_loss": 1.6634563641670423,
534
+ "train_runtime": 573.9155,
535
+ "train_samples_per_second": 696.967,
536
+ "train_steps_per_second": 1.359
537
+ }
538
+ ],
539
+ "logging_steps": 100,
540
+ "max_steps": 780,
541
+ "num_input_tokens_seen": 1949274656,
542
+ "num_train_epochs": 5,
543
+ "save_steps": 500,
544
+ "stateful_callbacks": {
545
+ "TrainerControl": {
546
+ "args": {
547
+ "should_epoch_stop": false,
548
+ "should_evaluate": false,
549
+ "should_log": false,
550
+ "should_save": true,
551
+ "should_training_stop": true
552
+ },
553
+ "attributes": {}
554
+ }
555
+ },
556
+ "total_flos": 1.297523316772307e+18,
557
+ "train_batch_size": 32,
558
+ "trial_name": null,
559
+ "trial_params": null
560
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7aaeac28840f0dd1915ce1f6f689f1e4cc1c4110cd16f4981ce08ea92145ce3
3
+ size 6840