2023-10-12 23:13:30,507 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,510 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 23:13:30,510 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,510 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-12 23:13:30,510 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,510 Train: 20847 sentences 2023-10-12 23:13:30,510 (train_with_dev=False, train_with_test=False) 2023-10-12 23:13:30,511 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,511 Training Params: 2023-10-12 23:13:30,511 - learning_rate: "0.00016" 2023-10-12 23:13:30,511 - mini_batch_size: "4" 2023-10-12 23:13:30,511 - max_epochs: "10" 2023-10-12 23:13:30,511 - shuffle: "True" 2023-10-12 23:13:30,511 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,511 Plugins: 2023-10-12 23:13:30,511 - TensorboardLogger 2023-10-12 23:13:30,511 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 23:13:30,511 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,511 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 23:13:30,511 - metric: "('micro avg', 'f1-score')" 2023-10-12 23:13:30,512 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,512 Computation: 2023-10-12 23:13:30,512 - compute on device: cuda:0 2023-10-12 23:13:30,512 - embedding storage: none 2023-10-12 23:13:30,512 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,512 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-12 23:13:30,512 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,512 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:13:30,512 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 23:16:08,563 epoch 1 - iter 521/5212 - loss 2.75555085 - time (sec): 158.05 - samples/sec: 257.52 - lr: 0.000016 - momentum: 0.000000 2023-10-12 23:18:42,724 epoch 1 - iter 1042/5212 - loss 2.30548429 - time (sec): 312.21 - samples/sec: 253.96 - lr: 0.000032 - momentum: 0.000000 2023-10-12 23:21:17,162 epoch 1 - iter 1563/5212 - loss 1.81068959 - time (sec): 466.65 - samples/sec: 246.98 - lr: 0.000048 - momentum: 0.000000 2023-10-12 23:23:52,349 epoch 1 - iter 2084/5212 - loss 1.47926246 - time (sec): 621.83 - samples/sec: 244.78 - lr: 0.000064 - momentum: 0.000000 2023-10-12 23:26:27,234 epoch 1 - iter 2605/5212 - loss 1.28430112 - time (sec): 776.72 - samples/sec: 244.12 - lr: 0.000080 - momentum: 0.000000 2023-10-12 23:29:02,351 epoch 1 - iter 3126/5212 - loss 1.12994919 - time (sec): 931.84 - samples/sec: 244.35 - lr: 0.000096 - momentum: 0.000000 2023-10-12 23:31:34,040 epoch 1 - iter 3647/5212 - loss 1.01667901 - time (sec): 1083.53 - samples/sec: 243.26 - lr: 0.000112 - momentum: 0.000000 2023-10-12 23:34:06,711 epoch 1 - iter 4168/5212 - loss 0.92769967 - time (sec): 1236.20 - samples/sec: 242.59 - lr: 0.000128 - momentum: 0.000000 2023-10-12 23:36:37,508 epoch 1 - iter 4689/5212 - loss 0.85937678 - time (sec): 1386.99 - samples/sec: 240.83 - lr: 0.000144 - momentum: 0.000000 2023-10-12 23:39:08,101 epoch 1 - iter 5210/5212 - loss 0.80186809 - time (sec): 1537.59 - samples/sec: 238.87 - lr: 0.000160 - momentum: 0.000000 2023-10-12 23:39:08,629 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:39:08,629 EPOCH 1 done: loss 0.8016 - lr: 0.000160 2023-10-12 23:39:46,522 DEV : loss 0.12198188155889511 - f1-score (micro avg) 0.2743 2023-10-12 23:39:46,580 saving best model 2023-10-12 23:39:47,523 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:42:21,633 epoch 2 - iter 521/5212 - loss 0.20982971 - time (sec): 154.11 - samples/sec: 232.22 - lr: 0.000158 - momentum: 0.000000 2023-10-12 23:44:56,075 epoch 2 - iter 1042/5212 - loss 0.18609627 - time (sec): 308.55 - samples/sec: 233.20 - lr: 0.000156 - momentum: 0.000000 2023-10-12 23:47:29,762 epoch 2 - iter 1563/5212 - loss 0.17643072 - time (sec): 462.24 - samples/sec: 230.90 - lr: 0.000155 - momentum: 0.000000 2023-10-12 23:50:04,642 epoch 2 - iter 2084/5212 - loss 0.17285415 - time (sec): 617.12 - samples/sec: 232.34 - lr: 0.000153 - momentum: 0.000000 2023-10-12 23:52:37,758 epoch 2 - iter 2605/5212 - loss 0.17102477 - time (sec): 770.23 - samples/sec: 233.35 - lr: 0.000151 - momentum: 0.000000 2023-10-12 23:55:11,394 epoch 2 - iter 3126/5212 - loss 0.16423488 - time (sec): 923.87 - samples/sec: 236.27 - lr: 0.000149 - momentum: 0.000000 2023-10-12 23:57:41,803 epoch 2 - iter 3647/5212 - loss 0.16019015 - time (sec): 1074.28 - samples/sec: 236.44 - lr: 0.000148 - momentum: 0.000000 2023-10-13 00:00:16,286 epoch 2 - iter 4168/5212 - loss 0.15881036 - time (sec): 1228.76 - samples/sec: 237.64 - lr: 0.000146 - momentum: 0.000000 2023-10-13 00:02:49,463 epoch 2 - iter 4689/5212 - loss 0.15679286 - time (sec): 1381.94 - samples/sec: 238.64 - lr: 0.000144 - momentum: 0.000000 2023-10-13 00:05:22,498 epoch 2 - iter 5210/5212 - loss 0.15495624 - time (sec): 1534.97 - samples/sec: 239.24 - lr: 0.000142 - momentum: 0.000000 2023-10-13 00:05:23,131 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:05:23,132 EPOCH 2 done: loss 0.1550 - lr: 0.000142 2023-10-13 00:06:05,180 DEV : loss 0.1396581381559372 - f1-score (micro avg) 0.3729 2023-10-13 00:06:05,233 saving best model 2023-10-13 00:06:07,867 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:08:37,736 epoch 3 - iter 521/5212 - loss 0.11118070 - time (sec): 149.87 - samples/sec: 237.18 - lr: 0.000140 - momentum: 0.000000 2023-10-13 00:11:08,565 epoch 3 - iter 1042/5212 - loss 0.11386111 - time (sec): 300.69 - samples/sec: 229.34 - lr: 0.000139 - momentum: 0.000000 2023-10-13 00:13:43,196 epoch 3 - iter 1563/5212 - loss 0.11114589 - time (sec): 455.32 - samples/sec: 234.58 - lr: 0.000137 - momentum: 0.000000 2023-10-13 00:16:15,918 epoch 3 - iter 2084/5212 - loss 0.10856248 - time (sec): 608.05 - samples/sec: 234.81 - lr: 0.000135 - momentum: 0.000000 2023-10-13 00:18:50,169 epoch 3 - iter 2605/5212 - loss 0.10632374 - time (sec): 762.30 - samples/sec: 235.60 - lr: 0.000133 - momentum: 0.000000 2023-10-13 00:21:23,791 epoch 3 - iter 3126/5212 - loss 0.10650590 - time (sec): 915.92 - samples/sec: 236.45 - lr: 0.000132 - momentum: 0.000000 2023-10-13 00:23:56,304 epoch 3 - iter 3647/5212 - loss 0.10745965 - time (sec): 1068.43 - samples/sec: 235.83 - lr: 0.000130 - momentum: 0.000000 2023-10-13 00:26:30,815 epoch 3 - iter 4168/5212 - loss 0.10776351 - time (sec): 1222.94 - samples/sec: 237.02 - lr: 0.000128 - momentum: 0.000000 2023-10-13 00:29:07,508 epoch 3 - iter 4689/5212 - loss 0.10598550 - time (sec): 1379.64 - samples/sec: 238.76 - lr: 0.000126 - momentum: 0.000000 2023-10-13 00:31:42,918 epoch 3 - iter 5210/5212 - loss 0.10574885 - time (sec): 1535.05 - samples/sec: 239.29 - lr: 0.000124 - momentum: 0.000000 2023-10-13 00:31:43,399 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:31:43,400 EPOCH 3 done: loss 0.1057 - lr: 0.000124 2023-10-13 00:32:27,379 DEV : loss 0.20252744853496552 - f1-score (micro avg) 0.3827 2023-10-13 00:32:27,450 saving best model 2023-10-13 00:32:28,501 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:35:00,575 epoch 4 - iter 521/5212 - loss 0.08118201 - time (sec): 152.07 - samples/sec: 235.09 - lr: 0.000123 - momentum: 0.000000 2023-10-13 00:37:34,272 epoch 4 - iter 1042/5212 - loss 0.07254490 - time (sec): 305.77 - samples/sec: 239.09 - lr: 0.000121 - momentum: 0.000000 2023-10-13 00:40:09,337 epoch 4 - iter 1563/5212 - loss 0.07458337 - time (sec): 460.83 - samples/sec: 237.47 - lr: 0.000119 - momentum: 0.000000 2023-10-13 00:42:46,638 epoch 4 - iter 2084/5212 - loss 0.07549473 - time (sec): 618.13 - samples/sec: 233.58 - lr: 0.000117 - momentum: 0.000000 2023-10-13 00:45:24,792 epoch 4 - iter 2605/5212 - loss 0.07444442 - time (sec): 776.29 - samples/sec: 235.28 - lr: 0.000116 - momentum: 0.000000 2023-10-13 00:48:00,846 epoch 4 - iter 3126/5212 - loss 0.07631596 - time (sec): 932.34 - samples/sec: 237.22 - lr: 0.000114 - momentum: 0.000000 2023-10-13 00:50:35,261 epoch 4 - iter 3647/5212 - loss 0.07273579 - time (sec): 1086.76 - samples/sec: 237.66 - lr: 0.000112 - momentum: 0.000000 2023-10-13 00:53:10,990 epoch 4 - iter 4168/5212 - loss 0.07211108 - time (sec): 1242.49 - samples/sec: 237.20 - lr: 0.000110 - momentum: 0.000000 2023-10-13 00:55:43,694 epoch 4 - iter 4689/5212 - loss 0.07080145 - time (sec): 1395.19 - samples/sec: 237.14 - lr: 0.000108 - momentum: 0.000000 2023-10-13 00:58:15,862 epoch 4 - iter 5210/5212 - loss 0.07202352 - time (sec): 1547.36 - samples/sec: 237.35 - lr: 0.000107 - momentum: 0.000000 2023-10-13 00:58:16,436 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:58:16,437 EPOCH 4 done: loss 0.0720 - lr: 0.000107 2023-10-13 00:58:58,321 DEV : loss 0.2412302941083908 - f1-score (micro avg) 0.4013 2023-10-13 00:58:58,375 saving best model 2023-10-13 00:59:00,992 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:01:32,332 epoch 5 - iter 521/5212 - loss 0.03749208 - time (sec): 151.33 - samples/sec: 238.31 - lr: 0.000105 - momentum: 0.000000 2023-10-13 01:04:10,857 epoch 5 - iter 1042/5212 - loss 0.04298893 - time (sec): 309.86 - samples/sec: 226.57 - lr: 0.000103 - momentum: 0.000000 2023-10-13 01:06:53,769 epoch 5 - iter 1563/5212 - loss 0.04481796 - time (sec): 472.77 - samples/sec: 227.66 - lr: 0.000101 - momentum: 0.000000 2023-10-13 01:09:29,964 epoch 5 - iter 2084/5212 - loss 0.04877373 - time (sec): 628.97 - samples/sec: 228.57 - lr: 0.000100 - momentum: 0.000000 2023-10-13 01:12:04,424 epoch 5 - iter 2605/5212 - loss 0.04819527 - time (sec): 783.43 - samples/sec: 231.32 - lr: 0.000098 - momentum: 0.000000 2023-10-13 01:14:31,113 epoch 5 - iter 3126/5212 - loss 0.04778786 - time (sec): 930.12 - samples/sec: 238.07 - lr: 0.000096 - momentum: 0.000000 2023-10-13 01:17:00,477 epoch 5 - iter 3647/5212 - loss 0.04812251 - time (sec): 1079.48 - samples/sec: 240.73 - lr: 0.000094 - momentum: 0.000000 2023-10-13 01:19:24,217 epoch 5 - iter 4168/5212 - loss 0.04864547 - time (sec): 1223.22 - samples/sec: 239.50 - lr: 0.000092 - momentum: 0.000000 2023-10-13 01:21:53,900 epoch 5 - iter 4689/5212 - loss 0.04769308 - time (sec): 1372.90 - samples/sec: 240.77 - lr: 0.000091 - momentum: 0.000000 2023-10-13 01:24:22,597 epoch 5 - iter 5210/5212 - loss 0.04927007 - time (sec): 1521.60 - samples/sec: 241.37 - lr: 0.000089 - momentum: 0.000000 2023-10-13 01:24:23,115 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:24:23,116 EPOCH 5 done: loss 0.0493 - lr: 0.000089 2023-10-13 01:25:04,195 DEV : loss 0.3131482005119324 - f1-score (micro avg) 0.3699 2023-10-13 01:25:04,249 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:27:33,202 epoch 6 - iter 521/5212 - loss 0.02676268 - time (sec): 148.95 - samples/sec: 237.45 - lr: 0.000087 - momentum: 0.000000 2023-10-13 01:30:06,274 epoch 6 - iter 1042/5212 - loss 0.02907373 - time (sec): 302.02 - samples/sec: 246.54 - lr: 0.000085 - momentum: 0.000000 2023-10-13 01:32:37,064 epoch 6 - iter 1563/5212 - loss 0.02955452 - time (sec): 452.81 - samples/sec: 247.39 - lr: 0.000084 - momentum: 0.000000 2023-10-13 01:35:05,717 epoch 6 - iter 2084/5212 - loss 0.03048557 - time (sec): 601.47 - samples/sec: 244.93 - lr: 0.000082 - momentum: 0.000000 2023-10-13 01:37:31,785 epoch 6 - iter 2605/5212 - loss 0.03142527 - time (sec): 747.53 - samples/sec: 243.10 - lr: 0.000080 - momentum: 0.000000 2023-10-13 01:40:05,706 epoch 6 - iter 3126/5212 - loss 0.03190970 - time (sec): 901.46 - samples/sec: 245.25 - lr: 0.000078 - momentum: 0.000000 2023-10-13 01:42:35,905 epoch 6 - iter 3647/5212 - loss 0.03307510 - time (sec): 1051.65 - samples/sec: 246.50 - lr: 0.000076 - momentum: 0.000000 2023-10-13 01:45:03,240 epoch 6 - iter 4168/5212 - loss 0.03314185 - time (sec): 1198.99 - samples/sec: 244.33 - lr: 0.000075 - momentum: 0.000000 2023-10-13 01:47:31,293 epoch 6 - iter 4689/5212 - loss 0.03297464 - time (sec): 1347.04 - samples/sec: 244.12 - lr: 0.000073 - momentum: 0.000000 2023-10-13 01:49:59,777 epoch 6 - iter 5210/5212 - loss 0.03275868 - time (sec): 1495.53 - samples/sec: 245.41 - lr: 0.000071 - momentum: 0.000000 2023-10-13 01:50:00,631 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:50:00,632 EPOCH 6 done: loss 0.0327 - lr: 0.000071 2023-10-13 01:50:43,711 DEV : loss 0.34465113282203674 - f1-score (micro avg) 0.3972 2023-10-13 01:50:43,765 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:53:11,195 epoch 7 - iter 521/5212 - loss 0.02092661 - time (sec): 147.43 - samples/sec: 250.56 - lr: 0.000069 - momentum: 0.000000 2023-10-13 01:55:39,696 epoch 7 - iter 1042/5212 - loss 0.02158009 - time (sec): 295.93 - samples/sec: 247.29 - lr: 0.000068 - momentum: 0.000000 2023-10-13 01:58:08,942 epoch 7 - iter 1563/5212 - loss 0.02258803 - time (sec): 445.17 - samples/sec: 246.14 - lr: 0.000066 - momentum: 0.000000 2023-10-13 02:00:38,348 epoch 7 - iter 2084/5212 - loss 0.02191469 - time (sec): 594.58 - samples/sec: 246.67 - lr: 0.000064 - momentum: 0.000000 2023-10-13 02:03:08,286 epoch 7 - iter 2605/5212 - loss 0.02349063 - time (sec): 744.52 - samples/sec: 247.11 - lr: 0.000062 - momentum: 0.000000 2023-10-13 02:05:43,405 epoch 7 - iter 3126/5212 - loss 0.02309326 - time (sec): 899.64 - samples/sec: 250.85 - lr: 0.000060 - momentum: 0.000000 2023-10-13 02:08:12,692 epoch 7 - iter 3647/5212 - loss 0.02247120 - time (sec): 1048.92 - samples/sec: 249.77 - lr: 0.000059 - momentum: 0.000000 2023-10-13 02:10:39,254 epoch 7 - iter 4168/5212 - loss 0.02308498 - time (sec): 1195.49 - samples/sec: 247.04 - lr: 0.000057 - momentum: 0.000000 2023-10-13 02:13:07,955 epoch 7 - iter 4689/5212 - loss 0.02310097 - time (sec): 1344.19 - samples/sec: 246.18 - lr: 0.000055 - momentum: 0.000000 2023-10-13 02:15:34,751 epoch 7 - iter 5210/5212 - loss 0.02339839 - time (sec): 1490.98 - samples/sec: 246.36 - lr: 0.000053 - momentum: 0.000000 2023-10-13 02:15:35,257 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:15:35,257 EPOCH 7 done: loss 0.0234 - lr: 0.000053 2023-10-13 02:16:18,509 DEV : loss 0.4042404890060425 - f1-score (micro avg) 0.3928 2023-10-13 02:16:18,576 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:18:49,551 epoch 8 - iter 521/5212 - loss 0.01647227 - time (sec): 150.97 - samples/sec: 248.96 - lr: 0.000052 - momentum: 0.000000 2023-10-13 02:21:19,167 epoch 8 - iter 1042/5212 - loss 0.01566138 - time (sec): 300.59 - samples/sec: 243.90 - lr: 0.000050 - momentum: 0.000000 2023-10-13 02:23:50,869 epoch 8 - iter 1563/5212 - loss 0.01443419 - time (sec): 452.29 - samples/sec: 244.66 - lr: 0.000048 - momentum: 0.000000 2023-10-13 02:26:22,959 epoch 8 - iter 2084/5212 - loss 0.01528289 - time (sec): 604.38 - samples/sec: 245.53 - lr: 0.000046 - momentum: 0.000000 2023-10-13 02:28:51,081 epoch 8 - iter 2605/5212 - loss 0.01595605 - time (sec): 752.50 - samples/sec: 242.48 - lr: 0.000044 - momentum: 0.000000 2023-10-13 02:31:21,596 epoch 8 - iter 3126/5212 - loss 0.01540844 - time (sec): 903.02 - samples/sec: 242.94 - lr: 0.000043 - momentum: 0.000000 2023-10-13 02:33:52,731 epoch 8 - iter 3647/5212 - loss 0.01578988 - time (sec): 1054.15 - samples/sec: 240.51 - lr: 0.000041 - momentum: 0.000000 2023-10-13 02:36:27,735 epoch 8 - iter 4168/5212 - loss 0.01607841 - time (sec): 1209.16 - samples/sec: 239.39 - lr: 0.000039 - momentum: 0.000000 2023-10-13 02:39:07,916 epoch 8 - iter 4689/5212 - loss 0.01567215 - time (sec): 1369.34 - samples/sec: 240.95 - lr: 0.000037 - momentum: 0.000000 2023-10-13 02:41:43,466 epoch 8 - iter 5210/5212 - loss 0.01581146 - time (sec): 1524.89 - samples/sec: 240.91 - lr: 0.000036 - momentum: 0.000000 2023-10-13 02:41:43,935 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:41:43,935 EPOCH 8 done: loss 0.0158 - lr: 0.000036 2023-10-13 02:42:27,273 DEV : loss 0.3817499279975891 - f1-score (micro avg) 0.4289 2023-10-13 02:42:27,335 saving best model 2023-10-13 02:42:29,986 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:06,407 epoch 9 - iter 521/5212 - loss 0.01315958 - time (sec): 156.42 - samples/sec: 239.10 - lr: 0.000034 - momentum: 0.000000 2023-10-13 02:47:37,548 epoch 9 - iter 1042/5212 - loss 0.01366808 - time (sec): 307.56 - samples/sec: 228.85 - lr: 0.000032 - momentum: 0.000000 2023-10-13 02:50:12,090 epoch 9 - iter 1563/5212 - loss 0.01146850 - time (sec): 462.10 - samples/sec: 232.76 - lr: 0.000030 - momentum: 0.000000 2023-10-13 02:52:48,671 epoch 9 - iter 2084/5212 - loss 0.01157916 - time (sec): 618.68 - samples/sec: 233.94 - lr: 0.000028 - momentum: 0.000000 2023-10-13 02:55:26,055 epoch 9 - iter 2605/5212 - loss 0.01149897 - time (sec): 776.06 - samples/sec: 236.50 - lr: 0.000027 - momentum: 0.000000 2023-10-13 02:58:00,927 epoch 9 - iter 3126/5212 - loss 0.01120108 - time (sec): 930.94 - samples/sec: 235.44 - lr: 0.000025 - momentum: 0.000000 2023-10-13 03:00:36,555 epoch 9 - iter 3647/5212 - loss 0.01111557 - time (sec): 1086.56 - samples/sec: 236.89 - lr: 0.000023 - momentum: 0.000000 2023-10-13 03:03:11,248 epoch 9 - iter 4168/5212 - loss 0.01081071 - time (sec): 1241.26 - samples/sec: 236.81 - lr: 0.000021 - momentum: 0.000000 2023-10-13 03:05:48,626 epoch 9 - iter 4689/5212 - loss 0.01063389 - time (sec): 1398.63 - samples/sec: 236.23 - lr: 0.000020 - momentum: 0.000000 2023-10-13 03:08:25,451 epoch 9 - iter 5210/5212 - loss 0.01045969 - time (sec): 1555.46 - samples/sec: 236.17 - lr: 0.000018 - momentum: 0.000000 2023-10-13 03:08:25,935 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:08:25,935 EPOCH 9 done: loss 0.0105 - lr: 0.000018 2023-10-13 03:09:09,234 DEV : loss 0.48951616883277893 - f1-score (micro avg) 0.3897 2023-10-13 03:09:09,294 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:11:46,910 epoch 10 - iter 521/5212 - loss 0.00827833 - time (sec): 157.61 - samples/sec: 232.24 - lr: 0.000016 - momentum: 0.000000 2023-10-13 03:14:23,722 epoch 10 - iter 1042/5212 - loss 0.00812183 - time (sec): 314.43 - samples/sec: 230.92 - lr: 0.000014 - momentum: 0.000000 2023-10-13 03:16:55,973 epoch 10 - iter 1563/5212 - loss 0.00802548 - time (sec): 466.68 - samples/sec: 235.55 - lr: 0.000012 - momentum: 0.000000 2023-10-13 03:19:34,134 epoch 10 - iter 2084/5212 - loss 0.00748456 - time (sec): 624.84 - samples/sec: 232.56 - lr: 0.000011 - momentum: 0.000000 2023-10-13 03:22:19,244 epoch 10 - iter 2605/5212 - loss 0.00703452 - time (sec): 789.95 - samples/sec: 230.49 - lr: 0.000009 - momentum: 0.000000 2023-10-13 03:25:02,871 epoch 10 - iter 3126/5212 - loss 0.00680004 - time (sec): 953.57 - samples/sec: 230.00 - lr: 0.000007 - momentum: 0.000000 2023-10-13 03:27:34,900 epoch 10 - iter 3647/5212 - loss 0.00656432 - time (sec): 1105.60 - samples/sec: 229.70 - lr: 0.000005 - momentum: 0.000000 2023-10-13 03:30:05,199 epoch 10 - iter 4168/5212 - loss 0.00684065 - time (sec): 1255.90 - samples/sec: 231.89 - lr: 0.000004 - momentum: 0.000000 2023-10-13 03:32:40,991 epoch 10 - iter 4689/5212 - loss 0.00663298 - time (sec): 1411.69 - samples/sec: 232.75 - lr: 0.000002 - momentum: 0.000000 2023-10-13 03:35:17,592 epoch 10 - iter 5210/5212 - loss 0.00646053 - time (sec): 1568.30 - samples/sec: 234.21 - lr: 0.000000 - momentum: 0.000000 2023-10-13 03:35:18,093 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:35:18,094 EPOCH 10 done: loss 0.0065 - lr: 0.000000 2023-10-13 03:36:00,153 DEV : loss 0.4974716603755951 - f1-score (micro avg) 0.392 2023-10-13 03:36:01,225 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:36:01,227 Loading model from best epoch ... 2023-10-13 03:36:05,239 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-13 03:37:50,682 Results: - F-score (micro) 0.4388 - F-score (macro) 0.3278 - Accuracy 0.2862 By class: precision recall f1-score support LOC 0.4679 0.4868 0.4772 1214 PER 0.4373 0.4530 0.4450 808 ORG 0.2928 0.3343 0.3122 353 HumanProd 0.0909 0.0667 0.0769 15 micro avg 0.4280 0.4502 0.4388 2390 macro avg 0.3222 0.3352 0.3278 2390 weighted avg 0.4293 0.4502 0.4394 2390 2023-10-13 03:37:50,684 ----------------------------------------------------------------------------------------------------