collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1051
- Num Input Tokens Seen: 51561256
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6395 | 0.0052 | 5 | 1.3878 | 267176 |
1.6107 | 0.0105 | 10 | 1.3632 | 540632 |
1.4506 | 0.0157 | 15 | 1.3071 | 813344 |
1.4544 | 0.0210 | 20 | 1.2594 | 1084816 |
1.3649 | 0.0262 | 25 | 1.2178 | 1358992 |
1.2237 | 0.0315 | 30 | 1.1855 | 1623176 |
1.0773 | 0.0367 | 35 | 1.2001 | 1896376 |
1.0064 | 0.0420 | 40 | 1.2029 | 2174144 |
0.881 | 0.0472 | 45 | 1.2243 | 2448704 |
0.7121 | 0.0525 | 50 | 1.2347 | 2725896 |
0.6637 | 0.0577 | 55 | 1.2538 | 3006256 |
0.5756 | 0.0630 | 60 | 1.2486 | 3270560 |
0.6061 | 0.0682 | 65 | 1.2163 | 3542448 |
0.5961 | 0.0735 | 70 | 1.2462 | 3812144 |
0.5377 | 0.0787 | 75 | 1.2146 | 4086216 |
0.435 | 0.0840 | 80 | 1.2242 | 4349616 |
0.4432 | 0.0892 | 85 | 1.2068 | 4621736 |
0.4509 | 0.0945 | 90 | 1.2115 | 4896352 |
0.3424 | 0.0997 | 95 | 1.2045 | 5167360 |
0.3246 | 0.1050 | 100 | 1.2066 | 5441952 |
0.2544 | 0.1102 | 105 | 1.2112 | 5718120 |
0.2491 | 0.1155 | 110 | 1.1994 | 5989720 |
0.2749 | 0.1207 | 115 | 1.1977 | 6257904 |
0.3471 | 0.1260 | 120 | 1.1918 | 6525952 |
0.3511 | 0.1312 | 125 | 1.1877 | 6798440 |
0.2885 | 0.1365 | 130 | 1.1872 | 7070104 |
0.2825 | 0.1417 | 135 | 1.1857 | 7336696 |
0.3035 | 0.1470 | 140 | 1.1862 | 7601464 |
0.321 | 0.1522 | 145 | 1.1838 | 7873064 |
0.269 | 0.1575 | 150 | 1.1795 | 8142088 |
0.2346 | 0.1627 | 155 | 1.1806 | 8420848 |
0.2221 | 0.1680 | 160 | 1.1779 | 8688752 |
0.2448 | 0.1732 | 165 | 1.1811 | 8950224 |
0.2481 | 0.1785 | 170 | 1.1734 | 9221064 |
0.2657 | 0.1837 | 175 | 1.1731 | 9490952 |
0.2078 | 0.1890 | 180 | 1.1774 | 9760272 |
0.1971 | 0.1942 | 185 | 1.1717 | 10030424 |
0.3152 | 0.1995 | 190 | 1.1740 | 10294632 |
0.3539 | 0.2047 | 195 | 1.1652 | 10570384 |
0.2638 | 0.2100 | 200 | 1.1660 | 10838120 |
0.2894 | 0.2152 | 205 | 1.1641 | 11112144 |
0.2773 | 0.2205 | 210 | 1.1633 | 11381896 |
0.2081 | 0.2257 | 215 | 1.1643 | 11648168 |
0.2585 | 0.2310 | 220 | 1.1650 | 11919680 |
0.2927 | 0.2362 | 225 | 1.1607 | 12199888 |
0.2706 | 0.2415 | 230 | 1.1563 | 12469960 |
0.2444 | 0.2467 | 235 | 1.1585 | 12742352 |
0.3255 | 0.2520 | 240 | 1.1537 | 13019496 |
0.1864 | 0.2572 | 245 | 1.1556 | 13291024 |
0.1361 | 0.2624 | 250 | 1.1589 | 13560408 |
0.2366 | 0.2677 | 255 | 1.1531 | 13829232 |
0.1542 | 0.2729 | 260 | 1.1539 | 14105144 |
0.2822 | 0.2782 | 265 | 1.1512 | 14376360 |
0.1825 | 0.2834 | 270 | 1.1496 | 14649592 |
0.2948 | 0.2887 | 275 | 1.1551 | 14921560 |
0.2679 | 0.2939 | 280 | 1.1502 | 15194304 |
0.158 | 0.2992 | 285 | 1.1546 | 15456096 |
0.2154 | 0.3044 | 290 | 1.1482 | 15730608 |
0.2468 | 0.3097 | 295 | 1.1464 | 16007568 |
0.2797 | 0.3149 | 300 | 1.1468 | 16277896 |
0.2034 | 0.3202 | 305 | 1.1468 | 16552168 |
0.207 | 0.3254 | 310 | 1.1477 | 16819608 |
0.1315 | 0.3307 | 315 | 1.1431 | 17095096 |
0.2116 | 0.3359 | 320 | 1.1466 | 17360040 |
0.1816 | 0.3412 | 325 | 1.1430 | 17628168 |
0.1886 | 0.3464 | 330 | 1.1419 | 17892832 |
0.2278 | 0.3517 | 335 | 1.1409 | 18160480 |
0.2196 | 0.3569 | 340 | 1.1372 | 18428000 |
0.1998 | 0.3622 | 345 | 1.1400 | 18703480 |
0.1677 | 0.3674 | 350 | 1.1422 | 18971744 |
0.2223 | 0.3727 | 355 | 1.1361 | 19232072 |
0.2093 | 0.3779 | 360 | 1.1416 | 19504104 |
0.1497 | 0.3832 | 365 | 1.1375 | 19778184 |
0.1653 | 0.3884 | 370 | 1.1388 | 20048968 |
0.2041 | 0.3937 | 375 | 1.1405 | 20317848 |
0.2684 | 0.3989 | 380 | 1.1339 | 20595176 |
0.1934 | 0.4042 | 385 | 1.1342 | 20872472 |
0.1928 | 0.4094 | 390 | 1.1338 | 21145584 |
0.2346 | 0.4147 | 395 | 1.1327 | 21416912 |
0.2328 | 0.4199 | 400 | 1.1342 | 21690224 |
0.164 | 0.4252 | 405 | 1.1311 | 21964640 |
0.2526 | 0.4304 | 410 | 1.1341 | 22238344 |
0.2819 | 0.4357 | 415 | 1.1312 | 22510304 |
0.239 | 0.4409 | 420 | 1.1300 | 22782368 |
0.2154 | 0.4462 | 425 | 1.1295 | 23049152 |
0.1869 | 0.4514 | 430 | 1.1303 | 23318744 |
0.1654 | 0.4567 | 435 | 1.1283 | 23590656 |
0.2803 | 0.4619 | 440 | 1.1289 | 23861080 |
0.1311 | 0.4672 | 445 | 1.1297 | 24130976 |
0.1567 | 0.4724 | 450 | 1.1267 | 24404720 |
0.2344 | 0.4777 | 455 | 1.1300 | 24675848 |
0.2017 | 0.4829 | 460 | 1.1268 | 24943744 |
0.1729 | 0.4882 | 465 | 1.1274 | 25217656 |
0.2135 | 0.4934 | 470 | 1.1255 | 25486608 |
0.2117 | 0.4987 | 475 | 1.1246 | 25756672 |
0.1748 | 0.5039 | 480 | 1.1274 | 26023496 |
0.2428 | 0.5092 | 485 | 1.1259 | 26297464 |
0.2141 | 0.5144 | 490 | 1.1225 | 26569864 |
0.1829 | 0.5197 | 495 | 1.1264 | 26841160 |
0.2652 | 0.5249 | 500 | 1.1240 | 27106056 |
0.2427 | 0.5301 | 505 | 1.1212 | 27368448 |
0.3393 | 0.5354 | 510 | 1.1203 | 27642144 |
0.1654 | 0.5406 | 515 | 1.1219 | 27909856 |
0.2285 | 0.5459 | 520 | 1.1240 | 28180576 |
0.1352 | 0.5511 | 525 | 1.1222 | 28446952 |
0.2311 | 0.5564 | 530 | 1.1222 | 28719016 |
0.1766 | 0.5616 | 535 | 1.1206 | 28991728 |
0.1618 | 0.5669 | 540 | 1.1222 | 29266888 |
0.2667 | 0.5721 | 545 | 1.1228 | 29536384 |
0.1595 | 0.5774 | 550 | 1.1198 | 29803968 |
0.1975 | 0.5826 | 555 | 1.1186 | 30077232 |
0.16 | 0.5879 | 560 | 1.1219 | 30344632 |
0.1519 | 0.5931 | 565 | 1.1203 | 30617848 |
0.2028 | 0.5984 | 570 | 1.1168 | 30886552 |
0.1633 | 0.6036 | 575 | 1.1172 | 31159704 |
0.2041 | 0.6089 | 580 | 1.1185 | 31435184 |
0.2646 | 0.6141 | 585 | 1.1188 | 31703784 |
0.1321 | 0.6194 | 590 | 1.1178 | 31965392 |
0.2071 | 0.6246 | 595 | 1.1189 | 32245064 |
0.1997 | 0.6299 | 600 | 1.1199 | 32522944 |
0.2234 | 0.6351 | 605 | 1.1158 | 32799088 |
0.2085 | 0.6404 | 610 | 1.1142 | 33066680 |
0.2189 | 0.6456 | 615 | 1.1181 | 33336744 |
0.1711 | 0.6509 | 620 | 1.1165 | 33608120 |
0.1327 | 0.6561 | 625 | 1.1165 | 33877624 |
0.1207 | 0.6614 | 630 | 1.1182 | 34153432 |
0.1734 | 0.6666 | 635 | 1.1163 | 34422440 |
0.2455 | 0.6719 | 640 | 1.1142 | 34691400 |
0.139 | 0.6771 | 645 | 1.1165 | 34963144 |
0.1745 | 0.6824 | 650 | 1.1162 | 35231216 |
0.1507 | 0.6876 | 655 | 1.1132 | 35499080 |
0.193 | 0.6929 | 660 | 1.1139 | 35771152 |
0.1836 | 0.6981 | 665 | 1.1190 | 36049240 |
0.1602 | 0.7034 | 670 | 1.1146 | 36323080 |
0.2058 | 0.7086 | 675 | 1.1125 | 36593960 |
0.1137 | 0.7139 | 680 | 1.1166 | 36855704 |
0.1914 | 0.7191 | 685 | 1.1165 | 37128976 |
0.1955 | 0.7244 | 690 | 1.1131 | 37393376 |
0.2652 | 0.7296 | 695 | 1.1144 | 37668392 |
0.2041 | 0.7349 | 700 | 1.1130 | 37941816 |
0.2098 | 0.7401 | 705 | 1.1137 | 38208872 |
0.1394 | 0.7454 | 710 | 1.1147 | 38483504 |
0.1655 | 0.7506 | 715 | 1.1131 | 38756464 |
0.2204 | 0.7559 | 720 | 1.1126 | 39022560 |
0.2006 | 0.7611 | 725 | 1.1147 | 39289592 |
0.1907 | 0.7664 | 730 | 1.1154 | 39561424 |
0.2051 | 0.7716 | 735 | 1.1157 | 39830136 |
0.1807 | 0.7769 | 740 | 1.1130 | 40099128 |
0.2034 | 0.7821 | 745 | 1.1115 | 40373136 |
0.2266 | 0.7873 | 750 | 1.1132 | 40649000 |
0.1649 | 0.7926 | 755 | 1.1125 | 40913656 |
0.1717 | 0.7978 | 760 | 1.1108 | 41190264 |
0.1176 | 0.8031 | 765 | 1.1117 | 41462168 |
0.2482 | 0.8083 | 770 | 1.1131 | 41737160 |
0.196 | 0.8136 | 775 | 1.1114 | 42007320 |
0.1976 | 0.8188 | 780 | 1.1120 | 42267744 |
0.2019 | 0.8241 | 785 | 1.1101 | 42538272 |
0.199 | 0.8293 | 790 | 1.1103 | 42808000 |
0.1572 | 0.8346 | 795 | 1.1096 | 43082512 |
0.2039 | 0.8398 | 800 | 1.1095 | 43352040 |
0.1645 | 0.8451 | 805 | 1.1079 | 43617936 |
0.1579 | 0.8503 | 810 | 1.1087 | 43885320 |
0.2538 | 0.8556 | 815 | 1.1083 | 44160984 |
0.2116 | 0.8608 | 820 | 1.1074 | 44432952 |
0.1852 | 0.8661 | 825 | 1.1074 | 44700144 |
0.1959 | 0.8713 | 830 | 1.1082 | 44965704 |
0.1776 | 0.8766 | 835 | 1.1083 | 45239256 |
0.2194 | 0.8818 | 840 | 1.1074 | 45506880 |
0.2027 | 0.8871 | 845 | 1.1061 | 45776016 |
0.2268 | 0.8923 | 850 | 1.1060 | 46047232 |
0.1698 | 0.8976 | 855 | 1.1069 | 46315208 |
0.1642 | 0.9028 | 860 | 1.1063 | 46586904 |
0.1471 | 0.9081 | 865 | 1.1056 | 46858472 |
0.1546 | 0.9133 | 870 | 1.1059 | 47129256 |
0.1702 | 0.9186 | 875 | 1.1068 | 47396888 |
0.1736 | 0.9238 | 880 | 1.1095 | 47666808 |
0.2423 | 0.9291 | 885 | 1.1071 | 47934920 |
0.1543 | 0.9343 | 890 | 1.1041 | 48209376 |
0.2803 | 0.9396 | 895 | 1.1065 | 48483448 |
0.1918 | 0.9448 | 900 | 1.1076 | 48746968 |
0.1441 | 0.9501 | 905 | 1.1020 | 49017312 |
0.2352 | 0.9553 | 910 | 1.1043 | 49295008 |
0.1239 | 0.9606 | 915 | 1.1040 | 49564488 |
0.2222 | 0.9658 | 920 | 1.1051 | 49834544 |
0.1531 | 0.9711 | 925 | 1.1042 | 50105776 |
0.1774 | 0.9763 | 930 | 1.1037 | 50374256 |
0.1364 | 0.9816 | 935 | 1.1044 | 50639352 |
0.1993 | 0.9868 | 940 | 1.1025 | 50908056 |
0.1525 | 0.9921 | 945 | 1.1039 | 51181256 |
0.2363 | 0.9973 | 950 | 1.1055 | 51454752 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2
Base model
google/gemma-2-2b