tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6382
- Rewards/chosen: -0.8614
- Rewards/rejected: -1.0551
- Rewards/accuracies: 0.6341
- Rewards/margins: 0.1937
- Logps/rejected: -168.6898
- Logps/chosen: -144.8481
- Logits/rejected: -2.0951
- Logits/chosen: -2.1077
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.0172 | 100 | 0.6932 | -0.0000 | 0.0000 | 0.4993 | -0.0000 | -63.1760 | -58.7121 | -3.1570 | -3.1626 |
0.6932 | 0.0345 | 200 | 0.6932 | -0.0000 | 0.0000 | 0.4902 | -0.0001 | -63.1777 | -58.7161 | -3.1578 | -3.1634 |
0.6932 | 0.0517 | 300 | 0.6932 | 0.0001 | 0.0001 | 0.4847 | -0.0001 | -63.1684 | -58.7055 | -3.1576 | -3.1633 |
0.6932 | 0.0689 | 400 | 0.6932 | 0.0001 | 0.0001 | 0.4814 | -0.0001 | -63.1658 | -58.7068 | -3.1575 | -3.1631 |
0.6931 | 0.0861 | 500 | 0.6932 | 0.0001 | 0.0001 | 0.4847 | -0.0000 | -63.1715 | -58.7052 | -3.1577 | -3.1633 |
0.6929 | 0.1034 | 600 | 0.6931 | 0.0002 | 0.0002 | 0.5037 | 0.0000 | -63.1560 | -58.6876 | -3.1571 | -3.1628 |
0.693 | 0.1206 | 700 | 0.6931 | 0.0003 | 0.0001 | 0.5214 | 0.0002 | -63.1660 | -58.6822 | -3.1562 | -3.1619 |
0.6927 | 0.1378 | 800 | 0.6931 | 0.0006 | 0.0005 | 0.5204 | 0.0001 | -63.1322 | -58.6491 | -3.1561 | -3.1618 |
0.6927 | 0.1551 | 900 | 0.6930 | 0.0008 | 0.0005 | 0.5300 | 0.0003 | -63.1317 | -58.6345 | -3.1554 | -3.1610 |
0.6928 | 0.1723 | 1000 | 0.6930 | 0.0011 | 0.0007 | 0.5258 | 0.0003 | -63.1075 | -58.6060 | -3.1540 | -3.1596 |
0.6922 | 0.1895 | 1100 | 0.6929 | 0.0013 | 0.0007 | 0.5455 | 0.0006 | -63.1103 | -58.5820 | -3.1523 | -3.1579 |
0.6921 | 0.2068 | 1200 | 0.6927 | 0.0017 | 0.0008 | 0.5574 | 0.0009 | -63.1011 | -58.5416 | -3.1500 | -3.1556 |
0.692 | 0.2240 | 1300 | 0.6925 | 0.0020 | 0.0007 | 0.5599 | 0.0013 | -63.1123 | -58.5097 | -3.1479 | -3.1535 |
0.6898 | 0.2412 | 1400 | 0.6923 | 0.0021 | 0.0002 | 0.5743 | 0.0018 | -63.1581 | -58.5058 | -3.1443 | -3.1500 |
0.6889 | 0.2584 | 1500 | 0.6920 | 0.0017 | -0.0007 | 0.5827 | 0.0024 | -63.2512 | -58.5426 | -3.1406 | -3.1462 |
0.69 | 0.2757 | 1600 | 0.6917 | 0.0011 | -0.0018 | 0.5785 | 0.0030 | -63.3644 | -58.5982 | -3.1355 | -3.1411 |
0.6897 | 0.2929 | 1700 | 0.6913 | 0.0001 | -0.0037 | 0.5727 | 0.0038 | -63.5467 | -58.6985 | -3.1294 | -3.1351 |
0.6857 | 0.3101 | 1800 | 0.6910 | -0.0016 | -0.0061 | 0.5734 | 0.0045 | -63.7882 | -58.8688 | -3.1244 | -3.1301 |
0.6866 | 0.3274 | 1900 | 0.6907 | -0.0038 | -0.0090 | 0.5843 | 0.0052 | -64.0830 | -59.0939 | -3.1188 | -3.1245 |
0.6872 | 0.3446 | 2000 | 0.6903 | -0.0075 | -0.0134 | 0.5862 | 0.0060 | -64.5228 | -59.4572 | -3.1120 | -3.1176 |
0.6854 | 0.3618 | 2100 | 0.6899 | -0.0124 | -0.0194 | 0.5813 | 0.0070 | -65.1230 | -59.9534 | -3.1057 | -3.1113 |
0.6786 | 0.3790 | 2200 | 0.6894 | -0.0185 | -0.0267 | 0.5836 | 0.0082 | -65.8538 | -60.5638 | -3.0978 | -3.1035 |
0.6801 | 0.3963 | 2300 | 0.6889 | -0.0230 | -0.0323 | 0.5915 | 0.0093 | -66.4100 | -61.0095 | -3.0912 | -3.0969 |
0.683 | 0.4135 | 2400 | 0.6882 | -0.0304 | -0.0413 | 0.5867 | 0.0108 | -67.3051 | -61.7559 | -3.0824 | -3.0881 |
0.6853 | 0.4307 | 2500 | 0.6876 | -0.0392 | -0.0515 | 0.5841 | 0.0123 | -68.3329 | -62.6367 | -3.0733 | -3.0790 |
0.6775 | 0.4480 | 2600 | 0.6870 | -0.0464 | -0.0600 | 0.5834 | 0.0136 | -69.1773 | -63.3517 | -3.0671 | -3.0728 |
0.6788 | 0.4652 | 2700 | 0.6864 | -0.0532 | -0.0681 | 0.5895 | 0.0150 | -69.9938 | -64.0275 | -3.0610 | -3.0668 |
0.6781 | 0.4824 | 2800 | 0.6860 | -0.0581 | -0.0740 | 0.5876 | 0.0159 | -70.5769 | -64.5225 | -3.0538 | -3.0595 |
0.6796 | 0.4997 | 2900 | 0.6857 | -0.0610 | -0.0777 | 0.5892 | 0.0166 | -70.9456 | -64.8128 | -3.0460 | -3.0517 |
0.6805 | 0.5169 | 3000 | 0.6853 | -0.0658 | -0.0834 | 0.5994 | 0.0176 | -71.5177 | -65.2877 | -3.0368 | -3.0425 |
0.673 | 0.5341 | 3100 | 0.6849 | -0.0663 | -0.0847 | 0.5987 | 0.0184 | -71.6468 | -65.3387 | -3.0324 | -3.0381 |
0.6747 | 0.5513 | 3200 | 0.6842 | -0.0780 | -0.0982 | 0.6027 | 0.0202 | -72.9963 | -66.5094 | -3.0209 | -3.0267 |
0.6743 | 0.5686 | 3300 | 0.6836 | -0.0836 | -0.1053 | 0.6022 | 0.0216 | -73.7081 | -67.0762 | -3.0078 | -3.0136 |
0.6653 | 0.5858 | 3400 | 0.6833 | -0.0846 | -0.1069 | 0.6011 | 0.0222 | -73.8674 | -67.1758 | -2.9991 | -3.0049 |
0.6764 | 0.6030 | 3500 | 0.6827 | -0.0900 | -0.1136 | 0.5999 | 0.0236 | -74.5369 | -67.7069 | -2.9912 | -2.9971 |
0.6737 | 0.6203 | 3600 | 0.6823 | -0.0962 | -0.1207 | 0.6104 | 0.0245 | -75.2502 | -68.3295 | -2.9812 | -2.9871 |
0.6664 | 0.6375 | 3700 | 0.6816 | -0.1051 | -0.1313 | 0.6080 | 0.0263 | -76.3151 | -69.2178 | -2.9692 | -2.9751 |
0.6667 | 0.6547 | 3800 | 0.6807 | -0.1172 | -0.1456 | 0.6085 | 0.0284 | -77.7401 | -70.4287 | -2.9595 | -2.9654 |
0.6678 | 0.6720 | 3900 | 0.6799 | -0.1299 | -0.1602 | 0.6092 | 0.0304 | -79.2047 | -71.6971 | -2.9499 | -2.9558 |
0.6671 | 0.6892 | 4000 | 0.6792 | -0.1408 | -0.1729 | 0.6078 | 0.0321 | -80.4742 | -72.7925 | -2.9368 | -2.9426 |
0.6554 | 0.7064 | 4100 | 0.6787 | -0.1458 | -0.1791 | 0.6120 | 0.0333 | -81.0925 | -73.2962 | -2.9179 | -2.9238 |
0.6742 | 0.7236 | 4200 | 0.6780 | -0.1580 | -0.1932 | 0.6127 | 0.0352 | -82.5005 | -74.5101 | -2.9044 | -2.9103 |
0.6632 | 0.7409 | 4300 | 0.6774 | -0.1672 | -0.2038 | 0.6078 | 0.0366 | -83.5592 | -75.4285 | -2.8933 | -2.8992 |
0.6639 | 0.7581 | 4400 | 0.6765 | -0.1825 | -0.2215 | 0.6064 | 0.0390 | -85.3312 | -76.9653 | -2.8808 | -2.8867 |
0.6617 | 0.7753 | 4500 | 0.6753 | -0.2011 | -0.2431 | 0.6078 | 0.0421 | -87.4948 | -78.8183 | -2.8704 | -2.8763 |
0.6446 | 0.7926 | 4600 | 0.6742 | -0.2184 | -0.2634 | 0.6080 | 0.0450 | -89.5165 | -80.5508 | -2.8604 | -2.8664 |
0.6536 | 0.8098 | 4700 | 0.6733 | -0.2347 | -0.2821 | 0.6078 | 0.0474 | -91.3895 | -82.1787 | -2.8507 | -2.8567 |
0.661 | 0.8270 | 4800 | 0.6723 | -0.2469 | -0.2967 | 0.6071 | 0.0498 | -92.8502 | -83.4062 | -2.8410 | -2.8470 |
0.6655 | 0.8442 | 4900 | 0.6714 | -0.2622 | -0.3144 | 0.6059 | 0.0522 | -94.6208 | -84.9348 | -2.8302 | -2.8362 |
0.65 | 0.8615 | 5000 | 0.6706 | -0.2730 | -0.3273 | 0.5957 | 0.0544 | -95.9136 | -86.0080 | -2.8112 | -2.8172 |
0.6625 | 0.8787 | 5100 | 0.6695 | -0.2893 | -0.3467 | 0.5997 | 0.0574 | -97.8500 | -87.6453 | -2.8012 | -2.8071 |
0.6509 | 0.8959 | 5200 | 0.6690 | -0.2924 | -0.3512 | 0.5985 | 0.0588 | -98.3012 | -87.9499 | -2.7931 | -2.7991 |
0.6469 | 0.9132 | 5300 | 0.6686 | -0.2979 | -0.3577 | 0.5978 | 0.0598 | -98.9499 | -88.5002 | -2.7822 | -2.7882 |
0.6482 | 0.9304 | 5400 | 0.6680 | -0.3024 | -0.3637 | 0.6039 | 0.0613 | -99.5495 | -88.9507 | -2.7739 | -2.7799 |
0.639 | 0.9476 | 5500 | 0.6673 | -0.3146 | -0.3781 | 0.6066 | 0.0635 | -100.9877 | -90.1737 | -2.7615 | -2.7675 |
0.6515 | 0.9649 | 5600 | 0.6668 | -0.3113 | -0.3759 | 0.6080 | 0.0647 | -100.7733 | -89.8396 | -2.7543 | -2.7603 |
0.6512 | 0.9821 | 5700 | 0.6657 | -0.3303 | -0.3982 | 0.6094 | 0.0680 | -103.0038 | -91.7385 | -2.7432 | -2.7493 |
0.6323 | 0.9993 | 5800 | 0.6645 | -0.3552 | -0.4268 | 0.6078 | 0.0716 | -105.8584 | -94.2304 | -2.7257 | -2.7318 |
0.632 | 1.0165 | 5900 | 0.6629 | -0.3911 | -0.4682 | 0.6085 | 0.0771 | -109.9998 | -97.8232 | -2.7023 | -2.7085 |
0.654 | 1.0338 | 6000 | 0.6632 | -0.3807 | -0.4571 | 0.6076 | 0.0764 | -108.8926 | -96.7834 | -2.6907 | -2.6969 |
0.6293 | 1.0510 | 6100 | 0.6624 | -0.3916 | -0.4703 | 0.6111 | 0.0787 | -110.2114 | -97.8768 | -2.6768 | -2.6831 |
0.6314 | 1.0682 | 6200 | 0.6611 | -0.4228 | -0.5060 | 0.6120 | 0.0832 | -113.7813 | -100.9947 | -2.6635 | -2.6697 |
0.6526 | 1.0855 | 6300 | 0.6599 | -0.4394 | -0.5262 | 0.6145 | 0.0869 | -115.8035 | -102.6482 | -2.6530 | -2.6593 |
0.6347 | 1.1027 | 6400 | 0.6593 | -0.4394 | -0.5278 | 0.6180 | 0.0884 | -115.9650 | -102.6523 | -2.6435 | -2.6499 |
0.6393 | 1.1199 | 6500 | 0.6588 | -0.4468 | -0.5370 | 0.6238 | 0.0901 | -116.8754 | -103.3932 | -2.6289 | -2.6354 |
0.6374 | 1.1371 | 6600 | 0.6590 | -0.4501 | -0.5403 | 0.6166 | 0.0901 | -117.2051 | -103.7237 | -2.6225 | -2.6289 |
0.6359 | 1.1544 | 6700 | 0.6581 | -0.4668 | -0.5605 | 0.6190 | 0.0936 | -119.2262 | -105.3939 | -2.6058 | -2.6123 |
0.6146 | 1.1716 | 6800 | 0.6567 | -0.4994 | -0.5980 | 0.6173 | 0.0987 | -122.9848 | -108.6496 | -2.5870 | -2.5937 |
0.6367 | 1.1888 | 6900 | 0.6561 | -0.5093 | -0.6101 | 0.6227 | 0.1008 | -124.1880 | -109.6397 | -2.5753 | -2.5820 |
0.6185 | 1.2061 | 7000 | 0.6549 | -0.5406 | -0.6465 | 0.6159 | 0.1059 | -127.8333 | -112.7735 | -2.5638 | -2.5706 |
0.6226 | 1.2233 | 7100 | 0.6558 | -0.5185 | -0.6213 | 0.6180 | 0.1028 | -125.3109 | -110.5579 | -2.5582 | -2.5651 |
0.6173 | 1.2405 | 7200 | 0.6550 | -0.5301 | -0.6358 | 0.6162 | 0.1057 | -126.7555 | -111.7189 | -2.5488 | -2.5557 |
0.6472 | 1.2578 | 7300 | 0.6553 | -0.5020 | -0.6054 | 0.6197 | 0.1034 | -123.7222 | -108.9138 | -2.5474 | -2.5543 |
0.6388 | 1.2750 | 7400 | 0.6552 | -0.4984 | -0.6021 | 0.6206 | 0.1037 | -123.3937 | -108.5536 | -2.5418 | -2.5489 |
0.641 | 1.2922 | 7500 | 0.6543 | -0.5020 | -0.6078 | 0.6227 | 0.1058 | -123.9613 | -108.9147 | -2.5332 | -2.5404 |
0.6721 | 1.3094 | 7600 | 0.6531 | -0.5286 | -0.6388 | 0.6229 | 0.1102 | -127.0605 | -111.5723 | -2.5152 | -2.5224 |
0.6262 | 1.3267 | 7700 | 0.6528 | -0.5440 | -0.6568 | 0.6199 | 0.1127 | -128.8555 | -113.1147 | -2.4986 | -2.5058 |
0.6077 | 1.3439 | 7800 | 0.6520 | -0.5730 | -0.6901 | 0.6231 | 0.1172 | -132.1913 | -116.0070 | -2.4824 | -2.4898 |
0.6293 | 1.3611 | 7900 | 0.6511 | -0.5869 | -0.7073 | 0.6234 | 0.1204 | -133.9143 | -117.4017 | -2.4749 | -2.4824 |
0.6065 | 1.3784 | 8000 | 0.6502 | -0.5931 | -0.7166 | 0.6236 | 0.1235 | -134.8416 | -118.0241 | -2.4667 | -2.4743 |
0.6328 | 1.3956 | 8100 | 0.6499 | -0.6051 | -0.7307 | 0.6255 | 0.1256 | -136.2457 | -119.2178 | -2.4558 | -2.4635 |
0.646 | 1.4128 | 8200 | 0.6494 | -0.6002 | -0.7264 | 0.6231 | 0.1262 | -135.8235 | -118.7345 | -2.4523 | -2.4600 |
0.6384 | 1.4300 | 8300 | 0.6500 | -0.5815 | -0.7052 | 0.6234 | 0.1237 | -133.6977 | -116.8619 | -2.4491 | -2.4568 |
0.6173 | 1.4473 | 8400 | 0.6504 | -0.5677 | -0.6897 | 0.6217 | 0.1219 | -132.1456 | -115.4836 | -2.4449 | -2.4526 |
0.6041 | 1.4645 | 8500 | 0.6501 | -0.5732 | -0.6969 | 0.6271 | 0.1237 | -132.8701 | -116.0278 | -2.4292 | -2.4370 |
0.6635 | 1.4817 | 8600 | 0.6490 | -0.6018 | -0.7304 | 0.6252 | 0.1286 | -136.2163 | -118.8894 | -2.4140 | -2.4220 |
0.6377 | 1.4990 | 8700 | 0.6499 | -0.5709 | -0.6951 | 0.6255 | 0.1243 | -132.6951 | -115.7986 | -2.4168 | -2.4247 |
0.6376 | 1.5162 | 8800 | 0.6488 | -0.5866 | -0.7147 | 0.6301 | 0.1281 | -134.6506 | -117.3752 | -2.4074 | -2.4155 |
0.6174 | 1.5334 | 8900 | 0.6478 | -0.6255 | -0.7594 | 0.6336 | 0.1339 | -139.1249 | -121.2650 | -2.3887 | -2.3969 |
0.6228 | 1.5507 | 9000 | 0.6478 | -0.6245 | -0.7587 | 0.6292 | 0.1342 | -139.0503 | -121.1639 | -2.3815 | -2.3898 |
0.6372 | 1.5679 | 9100 | 0.6480 | -0.6203 | -0.7539 | 0.6336 | 0.1335 | -138.5676 | -120.7465 | -2.3769 | -2.3852 |
0.6 | 1.5851 | 9200 | 0.6474 | -0.6400 | -0.7768 | 0.6329 | 0.1368 | -140.8612 | -122.7150 | -2.3665 | -2.3751 |
0.5989 | 1.6023 | 9300 | 0.6468 | -0.6474 | -0.7867 | 0.6341 | 0.1394 | -141.8543 | -123.4491 | -2.3576 | -2.3662 |
0.614 | 1.6196 | 9400 | 0.6459 | -0.6825 | -0.8279 | 0.6368 | 0.1454 | -145.9700 | -126.9618 | -2.3413 | -2.3500 |
0.596 | 1.6368 | 9500 | 0.6456 | -0.6809 | -0.8268 | 0.6368 | 0.1459 | -145.8628 | -126.8059 | -2.3333 | -2.3420 |
0.6174 | 1.6540 | 9600 | 0.6448 | -0.7214 | -0.8733 | 0.6364 | 0.1519 | -150.5126 | -130.8547 | -2.3123 | -2.3212 |
0.6332 | 1.6713 | 9700 | 0.6452 | -0.6900 | -0.8381 | 0.6357 | 0.1480 | -146.9875 | -127.7156 | -2.3143 | -2.3232 |
0.6115 | 1.6885 | 9800 | 0.6452 | -0.6884 | -0.8368 | 0.6341 | 0.1484 | -146.8605 | -127.5543 | -2.3134 | -2.3225 |
0.5539 | 1.7057 | 9900 | 0.6446 | -0.6932 | -0.8433 | 0.6322 | 0.1501 | -147.5115 | -128.0289 | -2.3106 | -2.3197 |
0.5881 | 1.7229 | 10000 | 0.6446 | -0.6998 | -0.8514 | 0.6357 | 0.1516 | -148.3202 | -128.6942 | -2.3004 | -2.3096 |
0.6197 | 1.7402 | 10100 | 0.6450 | -0.6864 | -0.8362 | 0.6343 | 0.1498 | -146.7977 | -127.3522 | -2.2940 | -2.3033 |
0.6029 | 1.7574 | 10200 | 0.6433 | -0.7383 | -0.8977 | 0.6336 | 0.1593 | -152.9491 | -132.5467 | -2.2721 | -2.2816 |
0.6441 | 1.7746 | 10300 | 0.6435 | -0.7404 | -0.8998 | 0.6324 | 0.1594 | -153.1610 | -132.7534 | -2.2664 | -2.2760 |
0.5718 | 1.7919 | 10400 | 0.6444 | -0.7047 | -0.8588 | 0.6341 | 0.1541 | -149.0603 | -129.1777 | -2.2712 | -2.2807 |
0.5866 | 1.8091 | 10500 | 0.6437 | -0.7266 | -0.8854 | 0.6343 | 0.1588 | -151.7161 | -131.3703 | -2.2598 | -2.2695 |
0.6278 | 1.8263 | 10600 | 0.6437 | -0.7187 | -0.8763 | 0.6348 | 0.1576 | -150.8070 | -130.5783 | -2.2553 | -2.2651 |
0.6083 | 1.8436 | 10700 | 0.6428 | -0.7398 | -0.9018 | 0.6306 | 0.1621 | -153.3647 | -132.6900 | -2.2435 | -2.2534 |
0.5999 | 1.8608 | 10800 | 0.6425 | -0.7467 | -0.9104 | 0.6324 | 0.1637 | -154.2222 | -133.3793 | -2.2412 | -2.2513 |
0.6016 | 1.8780 | 10900 | 0.6423 | -0.7546 | -0.9199 | 0.6343 | 0.1654 | -155.1725 | -134.1676 | -2.2317 | -2.2420 |
0.6056 | 1.8952 | 11000 | 0.6424 | -0.7430 | -0.9074 | 0.6303 | 0.1644 | -153.9158 | -133.0090 | -2.2336 | -2.2438 |
0.6068 | 1.9125 | 11100 | 0.6415 | -0.7764 | -0.9467 | 0.6315 | 0.1703 | -157.8523 | -136.3506 | -2.2170 | -2.2275 |
0.5907 | 1.9297 | 11200 | 0.6416 | -0.7643 | -0.9335 | 0.6324 | 0.1692 | -156.5323 | -135.1456 | -2.2154 | -2.2259 |
0.6504 | 1.9469 | 11300 | 0.6420 | -0.7478 | -0.9145 | 0.6289 | 0.1667 | -154.6342 | -133.4948 | -2.2172 | -2.2276 |
0.6037 | 1.9642 | 11400 | 0.6413 | -0.7627 | -0.9329 | 0.6296 | 0.1702 | -156.4750 | -134.9861 | -2.2093 | -2.2199 |
0.6435 | 1.9814 | 11500 | 0.6415 | -0.7615 | -0.9315 | 0.6301 | 0.1700 | -156.3274 | -134.8601 | -2.2078 | -2.2184 |
0.6037 | 1.9986 | 11600 | 0.6418 | -0.7425 | -0.9097 | 0.6294 | 0.1671 | -154.1468 | -132.9645 | -2.2119 | -2.2224 |
0.6036 | 2.0159 | 11700 | 0.6414 | -0.7444 | -0.9128 | 0.6289 | 0.1684 | -154.4553 | -133.1498 | -2.2068 | -2.2174 |
0.6111 | 2.0331 | 11800 | 0.6408 | -0.7710 | -0.9439 | 0.6285 | 0.1729 | -157.5724 | -135.8124 | -2.1917 | -2.2026 |
0.5739 | 2.0503 | 11900 | 0.6401 | -0.8062 | -0.9851 | 0.6283 | 0.1788 | -161.6872 | -139.3363 | -2.1752 | -2.1862 |
0.5807 | 2.0675 | 12000 | 0.6400 | -0.8128 | -0.9929 | 0.6327 | 0.1801 | -162.4718 | -139.9921 | -2.1663 | -2.1776 |
0.5904 | 2.0848 | 12100 | 0.6396 | -0.8183 | -0.9996 | 0.6317 | 0.1814 | -163.1447 | -140.5391 | -2.1626 | -2.1739 |
0.5722 | 2.1020 | 12200 | 0.6397 | -0.8246 | -1.0067 | 0.6327 | 0.1821 | -163.8479 | -141.1671 | -2.1591 | -2.1704 |
0.5874 | 2.1192 | 12300 | 0.6397 | -0.8221 | -1.0035 | 0.6343 | 0.1814 | -163.5287 | -140.9182 | -2.1576 | -2.1690 |
0.5575 | 2.1365 | 12400 | 0.6391 | -0.8641 | -1.0517 | 0.6341 | 0.1876 | -168.3473 | -145.1188 | -2.1426 | -2.1543 |
0.59 | 2.1537 | 12500 | 0.6392 | -0.8708 | -1.0586 | 0.6341 | 0.1878 | -169.0439 | -145.7953 | -2.1364 | -2.1481 |
0.6028 | 2.1709 | 12600 | 0.6394 | -0.8507 | -1.0363 | 0.6336 | 0.1856 | -166.8094 | -143.7794 | -2.1403 | -2.1519 |
0.5745 | 2.1881 | 12700 | 0.6394 | -0.8476 | -1.0328 | 0.6331 | 0.1852 | -166.4608 | -143.4725 | -2.1395 | -2.1511 |
0.6037 | 2.2054 | 12800 | 0.6395 | -0.8490 | -1.0347 | 0.6317 | 0.1857 | -166.6464 | -143.6127 | -2.1340 | -2.1457 |
0.5773 | 2.2226 | 12900 | 0.6393 | -0.8462 | -1.0320 | 0.6315 | 0.1858 | -166.3826 | -143.3317 | -2.1329 | -2.1446 |
0.5747 | 2.2398 | 13000 | 0.6391 | -0.8618 | -1.0498 | 0.6320 | 0.1880 | -168.1579 | -144.8899 | -2.1262 | -2.1381 |
0.5788 | 2.2571 | 13100 | 0.6392 | -0.8607 | -1.0489 | 0.6331 | 0.1882 | -168.0727 | -144.7845 | -2.1216 | -2.1335 |
0.6091 | 2.2743 | 13200 | 0.6390 | -0.8603 | -1.0494 | 0.6327 | 0.1891 | -168.1196 | -144.7427 | -2.1177 | -2.1296 |
0.6213 | 2.2915 | 13300 | 0.6393 | -0.8616 | -1.0503 | 0.6301 | 0.1886 | -168.2058 | -144.8738 | -2.1141 | -2.1261 |
0.5545 | 2.3088 | 13400 | 0.6397 | -0.8361 | -1.0209 | 0.6310 | 0.1848 | -165.2700 | -142.3214 | -2.1231 | -2.1350 |
0.5633 | 2.3260 | 13500 | 0.6392 | -0.8526 | -1.0406 | 0.6336 | 0.1879 | -167.2357 | -143.9755 | -2.1181 | -2.1301 |
0.5982 | 2.3432 | 13600 | 0.6391 | -0.8544 | -1.0431 | 0.6320 | 0.1886 | -167.4862 | -144.1549 | -2.1134 | -2.1255 |
0.6165 | 2.3604 | 13700 | 0.6390 | -0.8581 | -1.0475 | 0.6336 | 0.1894 | -167.9277 | -144.5217 | -2.1098 | -2.1221 |
0.5863 | 2.3777 | 13800 | 0.6393 | -0.8480 | -1.0361 | 0.6322 | 0.1881 | -166.7901 | -143.5142 | -2.1112 | -2.1233 |
0.6023 | 2.3949 | 13900 | 0.6395 | -0.8345 | -1.0207 | 0.6322 | 0.1862 | -165.2497 | -142.1660 | -2.1148 | -2.1269 |
0.551 | 2.4121 | 14000 | 0.6389 | -0.8440 | -1.0328 | 0.6331 | 0.1888 | -166.4650 | -143.1130 | -2.1104 | -2.1226 |
0.565 | 2.4294 | 14100 | 0.6394 | -0.8393 | -1.0266 | 0.6322 | 0.1874 | -165.8436 | -142.6391 | -2.1116 | -2.1238 |
0.555 | 2.4466 | 14200 | 0.6396 | -0.8346 | -1.0211 | 0.6317 | 0.1865 | -165.2906 | -142.1683 | -2.1129 | -2.1251 |
0.5303 | 2.4638 | 14300 | 0.6392 | -0.8468 | -1.0356 | 0.6313 | 0.1888 | -166.7382 | -143.3939 | -2.1079 | -2.1202 |
0.5998 | 2.4810 | 14400 | 0.6390 | -0.8530 | -1.0429 | 0.6350 | 0.1899 | -167.4716 | -144.0141 | -2.1038 | -2.1161 |
0.5688 | 2.4983 | 14500 | 0.6387 | -0.8590 | -1.0506 | 0.6338 | 0.1916 | -168.2381 | -144.6089 | -2.1014 | -2.1137 |
0.5601 | 2.5155 | 14600 | 0.6386 | -0.8520 | -1.0429 | 0.6341 | 0.1909 | -167.4715 | -143.9122 | -2.1035 | -2.1158 |
0.5694 | 2.5327 | 14700 | 0.6385 | -0.8549 | -1.0466 | 0.6336 | 0.1917 | -167.8379 | -144.2034 | -2.1025 | -2.1148 |
0.5762 | 2.5500 | 14800 | 0.6388 | -0.8514 | -1.0423 | 0.6327 | 0.1909 | -167.4103 | -143.8544 | -2.1027 | -2.1151 |
0.5944 | 2.5672 | 14900 | 0.6388 | -0.8497 | -1.0403 | 0.6322 | 0.1906 | -167.2102 | -143.6825 | -2.1028 | -2.1151 |
0.5766 | 2.5844 | 15000 | 0.6386 | -0.8528 | -1.0444 | 0.6327 | 0.1916 | -167.6185 | -143.9918 | -2.1007 | -2.1131 |
0.6066 | 2.6017 | 15100 | 0.6387 | -0.8545 | -1.0460 | 0.6334 | 0.1915 | -167.7836 | -144.1632 | -2.1001 | -2.1125 |
0.557 | 2.6189 | 15200 | 0.6385 | -0.8591 | -1.0515 | 0.6331 | 0.1924 | -168.3309 | -144.6236 | -2.0980 | -2.1104 |
0.5819 | 2.6361 | 15300 | 0.6384 | -0.8621 | -1.0552 | 0.6329 | 0.1931 | -168.6976 | -144.9198 | -2.0966 | -2.1092 |
0.6353 | 2.6533 | 15400 | 0.6384 | -0.8617 | -1.0548 | 0.6331 | 0.1931 | -168.6601 | -144.8850 | -2.0966 | -2.1091 |
0.6352 | 2.6706 | 15500 | 0.6385 | -0.8591 | -1.0515 | 0.6341 | 0.1924 | -168.3342 | -144.6245 | -2.0974 | -2.1098 |
0.5882 | 2.6878 | 15600 | 0.6384 | -0.8581 | -1.0511 | 0.6329 | 0.1930 | -168.2865 | -144.5229 | -2.0972 | -2.1097 |
0.5698 | 2.7050 | 15700 | 0.6384 | -0.8579 | -1.0506 | 0.6334 | 0.1928 | -168.2427 | -144.4972 | -2.0972 | -2.1098 |
0.5774 | 2.7223 | 15800 | 0.6383 | -0.8576 | -1.0507 | 0.6317 | 0.1931 | -168.2498 | -144.4737 | -2.0970 | -2.1095 |
0.5948 | 2.7395 | 15900 | 0.6385 | -0.8583 | -1.0511 | 0.6329 | 0.1928 | -168.2885 | -144.5436 | -2.0963 | -2.1088 |
0.5977 | 2.7567 | 16000 | 0.6382 | -0.8592 | -1.0527 | 0.6343 | 0.1935 | -168.4506 | -144.6316 | -2.0959 | -2.1084 |
0.5412 | 2.7739 | 16100 | 0.6385 | -0.8607 | -1.0535 | 0.6341 | 0.1927 | -168.5258 | -144.7848 | -2.0957 | -2.1081 |
0.6015 | 2.7912 | 16200 | 0.6385 | -0.8599 | -1.0527 | 0.6320 | 0.1927 | -168.4485 | -144.7054 | -2.0961 | -2.1086 |
0.5921 | 2.8084 | 16300 | 0.6382 | -0.8602 | -1.0537 | 0.6338 | 0.1935 | -168.5526 | -144.7336 | -2.0959 | -2.1084 |
0.5958 | 2.8256 | 16400 | 0.6384 | -0.8602 | -1.0534 | 0.6322 | 0.1932 | -168.5213 | -144.7309 | -2.0953 | -2.1078 |
0.5977 | 2.8429 | 16500 | 0.6384 | -0.8601 | -1.0531 | 0.6334 | 0.1931 | -168.4950 | -144.7180 | -2.0952 | -2.1077 |
0.6289 | 2.8601 | 16600 | 0.6382 | -0.8611 | -1.0549 | 0.6338 | 0.1937 | -168.6687 | -144.8262 | -2.0951 | -2.1076 |
0.6271 | 2.8773 | 16700 | 0.6385 | -0.8602 | -1.0531 | 0.6336 | 0.1929 | -168.4876 | -144.7302 | -2.0954 | -2.1080 |
0.5918 | 2.8946 | 16800 | 0.6384 | -0.8615 | -1.0546 | 0.6331 | 0.1931 | -168.6371 | -144.8581 | -2.0953 | -2.1078 |
0.5885 | 2.9118 | 16900 | 0.6383 | -0.8598 | -1.0533 | 0.6331 | 0.1935 | -168.5110 | -144.6941 | -2.0954 | -2.1080 |
0.6058 | 2.9290 | 17000 | 0.6384 | -0.8615 | -1.0547 | 0.6331 | 0.1933 | -168.6532 | -144.8587 | -2.0949 | -2.1075 |
0.5841 | 2.9462 | 17100 | 0.6384 | -0.8599 | -1.0531 | 0.6322 | 0.1932 | -168.4870 | -144.7006 | -2.0956 | -2.1082 |
0.6214 | 2.9635 | 17200 | 0.6385 | -0.8609 | -1.0538 | 0.6341 | 0.1930 | -168.5645 | -144.7976 | -2.0955 | -2.1081 |
0.5905 | 2.9807 | 17300 | 0.6385 | -0.8611 | -1.0541 | 0.6327 | 0.1931 | -168.5945 | -144.8186 | -2.0951 | -2.1076 |
0.5878 | 2.9979 | 17400 | 0.6382 | -0.8614 | -1.0551 | 0.6341 | 0.1937 | -168.6898 | -144.8481 | -2.0951 | -2.1077 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.