[2024-08-09 08:30:30,106][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-08-09 08:30:30,106][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-08-09/08-30-29- [2024-08-09 08:38:01,730][Main][INFO] - [train] Step 50 out of 80000 | Loss --> 60.113 | Grad_l2 --> 186.709 | Weights_l2 --> 8624.587 | Lr --> 0.004 | Seconds_per_step --> 8.363 | [2024-08-09 08:42:09,928][Main][INFO] - [train] Step 100 out of 80000 | Loss --> 22.120 | Grad_l2 --> 47.074 | Weights_l2 --> 8624.166 | Lr --> 0.004 | Seconds_per_step --> 4.964 | [2024-08-09 08:46:13,808][Main][INFO] - [train] Step 150 out of 80000 | Loss --> 12.856 | Grad_l2 --> 28.865 | Weights_l2 --> 8623.587 | Lr --> 0.004 | Seconds_per_step --> 4.878 | [2024-08-09 08:50:08,941][Main][INFO] - [train] Step 200 out of 80000 | Loss --> 10.357 | Grad_l2 --> 30.528 | Weights_l2 --> 8623.073 | Lr --> 0.004 | Seconds_per_step --> 4.703 | [2024-08-09 08:54:06,924][Main][INFO] - [train] Step 250 out of 80000 | Loss --> 8.792 | Grad_l2 --> 17.202 | Weights_l2 --> 8622.533 | Lr --> 0.004 | Seconds_per_step --> 4.760 | [2024-08-09 08:58:12,688][Main][INFO] - [train] Step 300 out of 80000 | Loss --> 7.720 | Grad_l2 --> 12.189 | Weights_l2 --> 8622.034 | Lr --> 0.004 | Seconds_per_step --> 4.915 | [2024-08-09 09:02:09,434][Main][INFO] - [train] Step 350 out of 80000 | Loss --> 7.276 | Grad_l2 --> 10.214 | Weights_l2 --> 8621.544 | Lr --> 0.004 | Seconds_per_step --> 4.735 | [2024-08-09 09:06:02,511][Main][INFO] - [train] Step 400 out of 80000 | Loss --> 7.054 | Grad_l2 --> 10.111 | Weights_l2 --> 8621.091 | Lr --> 0.004 | Seconds_per_step --> 4.662 | [2024-08-09 09:10:08,058][Main][INFO] - [train] Step 450 out of 80000 | Loss --> 6.941 | Grad_l2 --> 9.960 | Weights_l2 --> 8620.672 | Lr --> 0.004 | Seconds_per_step --> 4.911 | [2024-08-09 09:14:09,465][Main][INFO] - [train] Step 500 out of 80000 | Loss --> 6.777 | Grad_l2 --> 9.558 | Weights_l2 --> 8620.252 | Lr --> 0.004 | Seconds_per_step --> 4.828 | [2024-08-09 09:17:58,397][Main][INFO] - [train] Step 550 out of 80000 | Loss --> 6.730 | Grad_l2 --> 9.024 | Weights_l2 --> 8619.864 | Lr --> 0.004 | Seconds_per_step --> 4.579 | [2024-08-09 09:21:50,846][Main][INFO] - [train] Step 600 out of 80000 | Loss --> 6.626 | Grad_l2 --> 7.926 | Weights_l2 --> 8619.457 | Lr --> 0.004 | Seconds_per_step --> 4.649 | [2024-08-09 09:25:58,874][Main][INFO] - [train] Step 650 out of 80000 | Loss --> 6.504 | Grad_l2 --> 6.422 | Weights_l2 --> 8619.040 | Lr --> 0.004 | Seconds_per_step --> 4.961 | [2024-08-09 09:29:54,562][Main][INFO] - [train] Step 700 out of 80000 | Loss --> 6.425 | Grad_l2 --> 6.909 | Weights_l2 --> 8618.645 | Lr --> 0.004 | Seconds_per_step --> 4.714 | [2024-08-09 09:33:50,054][Main][INFO] - [train] Step 750 out of 80000 | Loss --> 6.413 | Grad_l2 --> 6.699 | Weights_l2 --> 8618.254 | Lr --> 0.004 | Seconds_per_step --> 4.710 | [2024-08-09 09:37:48,669][Main][INFO] - [train] Step 800 out of 80000 | Loss --> 6.339 | Grad_l2 --> 4.883 | Weights_l2 --> 8617.828 | Lr --> 0.004 | Seconds_per_step --> 4.772 | [2024-08-09 09:41:53,765][Main][INFO] - [train] Step 850 out of 80000 | Loss --> 6.305 | Grad_l2 --> 5.402 | Weights_l2 --> 8617.423 | Lr --> 0.004 | Seconds_per_step --> 4.902 | [2024-08-09 09:45:52,215][Main][INFO] - [train] Step 900 out of 80000 | Loss --> 6.254 | Grad_l2 --> 5.631 | Weights_l2 --> 8617.040 | Lr --> 0.004 | Seconds_per_step --> 4.769 | [2024-08-09 09:49:47,148][Main][INFO] - [train] Step 950 out of 80000 | Loss --> 6.232 | Grad_l2 --> 5.005 | Weights_l2 --> 8616.646 | Lr --> 0.004 | Seconds_per_step --> 4.699 | [2024-08-09 09:53:46,382][Main][INFO] - [train] Step 1000 out of 80000 | Loss --> 6.170 | Grad_l2 --> 5.456 | Weights_l2 --> 8616.274 | Lr --> 0.004 | Seconds_per_step --> 4.785 | [2024-08-09 09:57:42,782][Main][INFO] - [train] Step 1050 out of 80000 | Loss --> 6.163 | Grad_l2 --> 3.954 | Weights_l2 --> 8615.859 | Lr --> 0.004 | Seconds_per_step --> 4.728 | [2024-08-09 10:01:39,784][Main][INFO] - [train] Step 1100 out of 80000 | Loss --> 6.153 | Grad_l2 --> 4.661 | Weights_l2 --> 8615.485 | Lr --> 0.004 | Seconds_per_step --> 4.740 | [2024-08-09 10:05:37,074][Main][INFO] - [train] Step 1150 out of 80000 | Loss --> 6.120 | Grad_l2 --> 4.405 | Weights_l2 --> 8615.110 | Lr --> 0.004 | Seconds_per_step --> 4.746 | [2024-08-09 10:09:42,375][Main][INFO] - [train] Step 1200 out of 80000 | Loss --> 6.095 | Grad_l2 --> 4.862 | Weights_l2 --> 8614.756 | Lr --> 0.004 | Seconds_per_step --> 4.906 | [2024-08-09 10:13:44,826][Main][INFO] - [train] Step 1250 out of 80000 | Loss --> 6.065 | Grad_l2 --> 3.995 | Weights_l2 --> 8614.382 | Lr --> 0.004 | Seconds_per_step --> 4.849 | [2024-08-09 10:17:45,169][Main][INFO] - [train] Step 1300 out of 80000 | Loss --> 5.987 | Grad_l2 --> 4.501 | Weights_l2 --> 8614.025 | Lr --> 0.005 | Seconds_per_step --> 4.807 | [2024-08-09 10:21:46,890][Main][INFO] - [train] Step 1350 out of 80000 | Loss --> 6.011 | Grad_l2 --> 4.330 | Weights_l2 --> 8613.671 | Lr --> 0.005 | Seconds_per_step --> 4.834 | [2024-08-09 10:25:46,445][Main][INFO] - [train] Step 1400 out of 80000 | Loss --> 5.968 | Grad_l2 --> 4.033 | Weights_l2 --> 8613.308 | Lr --> 0.005 | Seconds_per_step --> 4.791 | [2024-08-09 10:29:35,135][Main][INFO] - [train] Step 1450 out of 80000 | Loss --> 5.965 | Grad_l2 --> 3.817 | Weights_l2 --> 8612.959 | Lr --> 0.005 | Seconds_per_step --> 4.574 | [2024-08-09 10:33:33,627][Main][INFO] - [train] Step 1500 out of 80000 | Loss --> 5.926 | Grad_l2 --> 3.525 | Weights_l2 --> 8612.605 | Lr --> 0.005 | Seconds_per_step --> 4.770 | [2024-08-09 10:37:31,600][Main][INFO] - [train] Step 1550 out of 80000 | Loss --> 5.908 | Grad_l2 --> 3.178 | Weights_l2 --> 8612.265 | Lr --> 0.005 | Seconds_per_step --> 4.759 | [2024-08-09 10:41:26,179][Main][INFO] - [train] Step 1600 out of 80000 | Loss --> 5.878 | Grad_l2 --> 3.430 | Weights_l2 --> 8611.930 | Lr --> 0.005 | Seconds_per_step --> 4.692 | [2024-08-09 10:45:17,990][Main][INFO] - [train] Step 1650 out of 80000 | Loss --> 5.864 | Grad_l2 --> 3.399 | Weights_l2 --> 8611.598 | Lr --> 0.005 | Seconds_per_step --> 4.636 | [2024-08-09 10:49:16,915][Main][INFO] - [train] Step 1700 out of 80000 | Loss --> 5.845 | Grad_l2 --> 3.266 | Weights_l2 --> 8611.279 | Lr --> 0.005 | Seconds_per_step --> 4.778 | [2024-08-09 10:53:22,739][Main][INFO] - [train] Step 1750 out of 80000 | Loss --> 5.815 | Grad_l2 --> 3.539 | Weights_l2 --> 8610.973 | Lr --> 0.005 | Seconds_per_step --> 4.916 | [2024-08-09 10:57:15,819][Main][INFO] - [train] Step 1800 out of 80000 | Loss --> 5.813 | Grad_l2 --> 3.014 | Weights_l2 --> 8610.660 | Lr --> 0.005 | Seconds_per_step --> 4.662 | [2024-08-09 11:01:07,812][Main][INFO] - [train] Step 1850 out of 80000 | Loss --> 5.781 | Grad_l2 --> 3.157 | Weights_l2 --> 8610.357 | Lr --> 0.005 | Seconds_per_step --> 4.640 | [2024-08-09 11:05:06,130][Main][INFO] - [train] Step 1900 out of 80000 | Loss --> 5.781 | Grad_l2 --> 2.876 | Weights_l2 --> 8610.069 | Lr --> 0.005 | Seconds_per_step --> 4.766 | [2024-08-09 11:09:10,053][Main][INFO] - [train] Step 1950 out of 80000 | Loss --> 5.727 | Grad_l2 --> 3.171 | Weights_l2 --> 8609.783 | Lr --> 0.005 | Seconds_per_step --> 4.878 | [2024-08-09 11:13:04,823][Main][INFO] - [train] Step 2000 out of 80000 | Loss --> 5.701 | Grad_l2 --> 3.384 | Weights_l2 --> 8609.494 | Lr --> 0.005 | Seconds_per_step --> 4.695 | [2024-08-09 11:16:58,015][Main][INFO] - [train] Step 2050 out of 80000 | Loss --> 5.706 | Grad_l2 --> 2.739 | Weights_l2 --> 8609.191 | Lr --> 0.005 | Seconds_per_step --> 4.664 | [2024-08-09 11:21:09,220][Main][INFO] - [train] Step 2100 out of 80000 | Loss --> 5.697 | Grad_l2 --> 2.753 | Weights_l2 --> 8608.924 | Lr --> 0.005 | Seconds_per_step --> 5.024 | [2024-08-09 11:24:59,988][Main][INFO] - [train] Step 2150 out of 80000 | Loss --> 5.679 | Grad_l2 --> 2.713 | Weights_l2 --> 8608.657 | Lr --> 0.005 | Seconds_per_step --> 4.615 | [2024-08-09 11:28:50,211][Main][INFO] - [train] Step 2200 out of 80000 | Loss --> 5.659 | Grad_l2 --> 2.789 | Weights_l2 --> 8608.401 | Lr --> 0.005 | Seconds_per_step --> 4.604 | [2024-08-09 11:32:47,428][Main][INFO] - [train] Step 2250 out of 80000 | Loss --> 5.643 | Grad_l2 --> 3.085 | Weights_l2 --> 8608.150 | Lr --> 0.005 | Seconds_per_step --> 4.744 | [2024-08-09 11:36:52,444][Main][INFO] - [train] Step 2300 out of 80000 | Loss --> 5.606 | Grad_l2 --> 3.170 | Weights_l2 --> 8607.880 | Lr --> 0.005 | Seconds_per_step --> 4.900 | [2024-08-09 11:40:40,829][Main][INFO] - [train] Step 2350 out of 80000 | Loss --> 5.585 | Grad_l2 --> 2.834 | Weights_l2 --> 8607.632 | Lr --> 0.005 | Seconds_per_step --> 4.568 | [2024-08-09 11:44:35,220][Main][INFO] - [train] Step 2400 out of 80000 | Loss --> 5.595 | Grad_l2 --> 2.603 | Weights_l2 --> 8607.391 | Lr --> 0.005 | Seconds_per_step --> 4.688 | [2024-08-09 11:47:52,825][Main][INFO] - [train] Step 2450 out of 80000 | Loss --> 5.571 | Grad_l2 --> 2.616 | Weights_l2 --> 8607.146 | Lr --> 0.005 | Seconds_per_step --> 3.952 | [2024-08-09 11:50:42,712][Main][INFO] - [train] Step 2500 out of 80000 | Loss --> 5.588 | Grad_l2 --> 2.392 | Weights_l2 --> 8606.913 | Lr --> 0.005 | Seconds_per_step --> 3.398 | [2024-08-09 11:54:19,840][Main][INFO] - [train] Step 2550 out of 80000 | Loss --> 5.598 | Grad_l2 --> 3.058 | Weights_l2 --> 8606.708 | Lr --> 0.005 | Seconds_per_step --> 4.343 | [2024-08-09 11:58:07,896][Main][INFO] - [train] Step 2600 out of 80000 | Loss --> 5.554 | Grad_l2 --> 2.508 | Weights_l2 --> 8606.498 | Lr --> 0.005 | Seconds_per_step --> 4.561 | [2024-08-09 12:02:07,989][Main][INFO] - [train] Step 2650 out of 80000 | Loss --> 5.536 | Grad_l2 --> 2.317 | Weights_l2 --> 8606.300 | Lr --> 0.005 | Seconds_per_step --> 4.802 | [2024-08-09 12:06:22,355][Main][INFO] - [train] Step 2700 out of 80000 | Loss --> 5.533 | Grad_l2 --> 2.347 | Weights_l2 --> 8606.121 | Lr --> 0.005 | Seconds_per_step --> 5.087 | [2024-08-09 12:10:05,296][Main][INFO] - [train] Step 2750 out of 80000 | Loss --> 5.502 | Grad_l2 --> 2.522 | Weights_l2 --> 8605.932 | Lr --> 0.005 | Seconds_per_step --> 4.459 | [2024-08-09 12:13:56,942][Main][INFO] - [train] Step 2800 out of 80000 | Loss --> 5.484 | Grad_l2 --> 2.503 | Weights_l2 --> 8605.729 | Lr --> 0.005 | Seconds_per_step --> 4.633 | [2024-08-09 12:17:56,310][Main][INFO] - [train] Step 2850 out of 80000 | Loss --> 5.471 | Grad_l2 --> 2.559 | Weights_l2 --> 8605.524 | Lr --> 0.005 | Seconds_per_step --> 4.787 | [2024-08-09 12:21:50,249][Main][INFO] - [train] Step 2900 out of 80000 | Loss --> 5.463 | Grad_l2 --> 2.446 | Weights_l2 --> 8605.344 | Lr --> 0.005 | Seconds_per_step --> 4.679 | [2024-08-09 12:25:43,300][Main][INFO] - [train] Step 2950 out of 80000 | Loss --> 5.481 | Grad_l2 --> 2.152 | Weights_l2 --> 8605.182 | Lr --> 0.005 | Seconds_per_step --> 4.661 | [2024-08-09 12:29:34,779][Main][INFO] - [train] Step 3000 out of 80000 | Loss --> 5.444 | Grad_l2 --> 2.267 | Weights_l2 --> 8605.025 | Lr --> 0.005 | Seconds_per_step --> 4.630 | [2024-08-09 12:33:43,889][Main][INFO] - [train] Step 3050 out of 80000 | Loss --> 5.445 | Grad_l2 --> 2.029 | Weights_l2 --> 8604.870 | Lr --> 0.005 | Seconds_per_step --> 4.982 | [2024-08-09 12:37:33,552][Main][INFO] - [train] Step 3100 out of 80000 | Loss --> 5.439 | Grad_l2 --> 2.249 | Weights_l2 --> 8604.734 | Lr --> 0.005 | Seconds_per_step --> 4.593 | [2024-08-09 12:41:33,458][Main][INFO] - [train] Step 3150 out of 80000 | Loss --> 5.390 | Grad_l2 --> 2.281 | Weights_l2 --> 8604.574 | Lr --> 0.005 | Seconds_per_step --> 4.798 | [2024-08-09 12:45:28,169][Main][INFO] - [train] Step 3200 out of 80000 | Loss --> 5.395 | Grad_l2 --> 2.124 | Weights_l2 --> 8604.424 | Lr --> 0.005 | Seconds_per_step --> 4.694 | [2024-08-09 12:49:31,716][Main][INFO] - [train] Step 3250 out of 80000 | Loss --> 5.381 | Grad_l2 --> 2.379 | Weights_l2 --> 8604.286 | Lr --> 0.005 | Seconds_per_step --> 4.871 | [2024-08-09 12:53:26,686][Main][INFO] - [train] Step 3300 out of 80000 | Loss --> 5.365 | Grad_l2 --> 2.335 | Weights_l2 --> 8604.130 | Lr --> 0.005 | Seconds_per_step --> 4.699 | [2024-08-09 12:57:18,564][Main][INFO] - [train] Step 3350 out of 80000 | Loss --> 5.365 | Grad_l2 --> 2.185 | Weights_l2 --> 8603.989 | Lr --> 0.005 | Seconds_per_step --> 4.638 | [2024-08-09 13:01:23,837][Main][INFO] - [train] Step 3400 out of 80000 | Loss --> 5.347 | Grad_l2 --> 2.330 | Weights_l2 --> 8603.845 | Lr --> 0.005 | Seconds_per_step --> 4.905 | [2024-08-09 13:05:16,575][Main][INFO] - [train] Step 3450 out of 80000 | Loss --> 5.349 | Grad_l2 --> 1.951 | Weights_l2 --> 8603.727 | Lr --> 0.005 | Seconds_per_step --> 4.655 | [2024-08-09 13:08:27,542][Main][INFO] - [train] Step 3500 out of 80000 | Loss --> 5.356 | Grad_l2 --> 1.986 | Weights_l2 --> 8603.662 | Lr --> 0.005 | Seconds_per_step --> 3.819 | [2024-08-09 13:12:30,541][Main][INFO] - [train] Step 3550 out of 80000 | Loss --> 5.312 | Grad_l2 --> 2.396 | Weights_l2 --> 8603.545 | Lr --> 0.005 | Seconds_per_step --> 4.860 | [2024-08-09 13:16:49,213][Main][INFO] - [train] Step 3600 out of 80000 | Loss --> 5.299 | Grad_l2 --> 2.230 | Weights_l2 --> 8603.411 | Lr --> 0.005 | Seconds_per_step --> 5.173 | [2024-08-09 13:20:53,058][Main][INFO] - [train] Step 3650 out of 80000 | Loss --> 5.307 | Grad_l2 --> 2.386 | Weights_l2 --> 8603.284 | Lr --> 0.005 | Seconds_per_step --> 4.877 | [2024-08-09 13:24:44,487][Main][INFO] - [train] Step 3700 out of 80000 | Loss --> 5.293 | Grad_l2 --> 2.071 | Weights_l2 --> 8603.169 | Lr --> 0.005 | Seconds_per_step --> 4.629 | [2024-08-09 13:28:47,607][Main][INFO] - [train] Step 3750 out of 80000 | Loss --> 5.298 | Grad_l2 --> 2.199 | Weights_l2 --> 8603.065 | Lr --> 0.005 | Seconds_per_step --> 4.862 | [2024-08-09 13:32:52,512][Main][INFO] - [train] Step 3800 out of 80000 | Loss --> 5.277 | Grad_l2 --> 2.091 | Weights_l2 --> 8602.962 | Lr --> 0.006 | Seconds_per_step --> 4.898 | [2024-08-09 13:36:42,719][Main][INFO] - [train] Step 3850 out of 80000 | Loss --> 5.284 | Grad_l2 --> 2.042 | Weights_l2 --> 8602.881 | Lr --> 0.006 | Seconds_per_step --> 4.604 | [2024-08-09 13:40:34,318][Main][INFO] - [train] Step 3900 out of 80000 | Loss --> 5.245 | Grad_l2 --> 2.240 | Weights_l2 --> 8602.781 | Lr --> 0.006 | Seconds_per_step --> 4.632 | [2024-08-09 13:44:45,754][Main][INFO] - [train] Step 3950 out of 80000 | Loss --> 5.245 | Grad_l2 --> 1.955 | Weights_l2 --> 8602.686 | Lr --> 0.006 | Seconds_per_step --> 5.029 | [2024-08-09 13:48:39,099][Main][INFO] - [train] Step 4000 out of 80000 | Loss --> 5.257 | Grad_l2 --> 2.011 | Weights_l2 --> 8602.644 | Lr --> 0.006 | Seconds_per_step --> 4.667 | [2024-08-09 13:52:31,353][Main][INFO] - [train] Step 4050 out of 80000 | Loss --> 5.239 | Grad_l2 --> 1.838 | Weights_l2 --> 8602.573 | Lr --> 0.006 | Seconds_per_step --> 4.645 | [2024-08-09 13:56:29,186][Main][INFO] - [train] Step 4100 out of 80000 | Loss --> 5.238 | Grad_l2 --> 1.935 | Weights_l2 --> 8602.540 | Lr --> 0.006 | Seconds_per_step --> 4.757 | [2024-08-09 14:00:27,682][Main][INFO] - [train] Step 4150 out of 80000 | Loss --> 5.211 | Grad_l2 --> 2.014 | Weights_l2 --> 8602.468 | Lr --> 0.006 | Seconds_per_step --> 4.770 | [2024-08-09 14:04:26,879][Main][INFO] - [train] Step 4200 out of 80000 | Loss --> 5.202 | Grad_l2 --> 2.106 | Weights_l2 --> 8602.418 | Lr --> 0.006 | Seconds_per_step --> 4.784 | [2024-08-09 14:08:26,097][Main][INFO] - [train] Step 4250 out of 80000 | Loss --> 5.194 | Grad_l2 --> 1.876 | Weights_l2 --> 8602.330 | Lr --> 0.006 | Seconds_per_step --> 4.784 | [2024-08-09 14:12:43,883][Main][INFO] - [train] Step 4300 out of 80000 | Loss --> 5.216 | Grad_l2 --> 1.692 | Weights_l2 --> 8602.339 | Lr --> 0.006 | Seconds_per_step --> 5.156 | [2024-08-09 14:16:59,892][Main][INFO] - [train] Step 4350 out of 80000 | Loss --> 5.195 | Grad_l2 --> 1.824 | Weights_l2 --> 8602.342 | Lr --> 0.006 | Seconds_per_step --> 5.120 | [2024-08-09 14:20:57,072][Main][INFO] - [train] Step 4400 out of 80000 | Loss --> 5.193 | Grad_l2 --> 1.640 | Weights_l2 --> 8602.351 | Lr --> 0.006 | Seconds_per_step --> 4.744 | [2024-08-09 14:25:01,683][Main][INFO] - [train] Step 4450 out of 80000 | Loss --> 5.186 | Grad_l2 --> 1.790 | Weights_l2 --> 8602.369 | Lr --> 0.006 | Seconds_per_step --> 4.892 | [2024-08-09 14:29:08,638][Main][INFO] - [train] Step 4500 out of 80000 | Loss --> 5.162 | Grad_l2 --> 1.890 | Weights_l2 --> 8602.364 | Lr --> 0.006 | Seconds_per_step --> 4.939 | [2024-08-09 14:32:58,390][Main][INFO] - [train] Step 4550 out of 80000 | Loss --> 5.136 | Grad_l2 --> 1.776 | Weights_l2 --> 8602.345 | Lr --> 0.006 | Seconds_per_step --> 4.595 | [2024-08-09 14:37:00,248][Main][INFO] - [train] Step 4600 out of 80000 | Loss --> 5.135 | Grad_l2 --> 1.661 | Weights_l2 --> 8602.366 | Lr --> 0.006 | Seconds_per_step --> 4.837 | [2024-08-09 14:41:11,560][Main][INFO] - [train] Step 4650 out of 80000 | Loss --> 5.139 | Grad_l2 --> 1.623 | Weights_l2 --> 8602.434 | Lr --> 0.006 | Seconds_per_step --> 5.026 | [2024-08-09 14:45:14,951][Main][INFO] - [train] Step 4700 out of 80000 | Loss --> 5.090 | Grad_l2 --> 1.703 | Weights_l2 --> 8602.491 | Lr --> 0.006 | Seconds_per_step --> 4.868 | [2024-08-09 14:49:09,655][Main][INFO] - [train] Step 4750 out of 80000 | Loss --> 5.056 | Grad_l2 --> 1.918 | Weights_l2 --> 8602.542 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 14:53:11,228][Main][INFO] - [train] Step 4800 out of 80000 | Loss --> 5.018 | Grad_l2 --> 1.805 | Weights_l2 --> 8602.552 | Lr --> 0.006 | Seconds_per_step --> 4.831 | [2024-08-09 14:57:15,004][Main][INFO] - [train] Step 4850 out of 80000 | Loss --> 5.016 | Grad_l2 --> 1.660 | Weights_l2 --> 8602.639 | Lr --> 0.006 | Seconds_per_step --> 4.876 | [2024-08-09 15:01:09,698][Main][INFO] - [train] Step 4900 out of 80000 | Loss --> 4.994 | Grad_l2 --> 1.595 | Weights_l2 --> 8602.806 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 15:04:01,695][Main][INFO] - [train] Step 4950 out of 80000 | Loss --> 4.946 | Grad_l2 --> 1.783 | Weights_l2 --> 8602.949 | Lr --> 0.006 | Seconds_per_step --> 3.440 | [2024-08-09 15:07:39,946][Main][INFO] - [train] Step 5000 out of 80000 | Loss --> 4.722 | Grad_l2 --> 1.590 | Weights_l2 --> 8603.165 | Lr --> 0.006 | Seconds_per_step --> 4.365 | [2024-08-09 15:07:39,947][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-5000 [2024-08-09 15:07:39,951][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-09 15:07:46,022][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-5000/model.safetensors [2024-08-09 15:07:49,438][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-5000/optimizer.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-5000/scheduler.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-5000/sampler.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-5000/sampler_1.bin [2024-08-09 15:07:49,440][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-5000/random_states_0.pkl [2024-08-09 15:11:55,741][Main][INFO] - [train] Step 5050 out of 80000 | Loss --> 4.582 | Grad_l2 --> 1.679 | Weights_l2 --> 8603.473 | Lr --> 0.006 | Seconds_per_step --> 5.116 | [2024-08-09 15:15:46,314][Main][INFO] - [train] Step 5100 out of 80000 | Loss --> 4.472 | Grad_l2 --> 1.636 | Weights_l2 --> 8603.746 | Lr --> 0.006 | Seconds_per_step --> 4.611 | [2024-08-09 15:19:45,374][Main][INFO] - [train] Step 5150 out of 80000 | Loss --> 4.370 | Grad_l2 --> 1.523 | Weights_l2 --> 8604.092 | Lr --> 0.006 | Seconds_per_step --> 4.781 | [2024-08-09 15:23:51,223][Main][INFO] - [train] Step 5200 out of 80000 | Loss --> 4.267 | Grad_l2 --> 1.542 | Weights_l2 --> 8604.440 | Lr --> 0.006 | Seconds_per_step --> 4.917 | [2024-08-09 15:27:51,655][Main][INFO] - [train] Step 5250 out of 80000 | Loss --> 4.191 | Grad_l2 --> 1.477 | Weights_l2 --> 8604.872 | Lr --> 0.006 | Seconds_per_step --> 4.809 | [2024-08-09 15:31:44,251][Main][INFO] - [train] Step 5300 out of 80000 | Loss --> 4.128 | Grad_l2 --> 1.490 | Weights_l2 --> 8605.306 | Lr --> 0.006 | Seconds_per_step --> 4.652 | [2024-08-09 15:35:40,470][Main][INFO] - [train] Step 5350 out of 80000 | Loss --> 4.067 | Grad_l2 --> 1.397 | Weights_l2 --> 8605.776 | Lr --> 0.006 | Seconds_per_step --> 4.724 | [2024-08-09 15:39:48,973][Main][INFO] - [train] Step 5400 out of 80000 | Loss --> 4.015 | Grad_l2 --> 1.239 | Weights_l2 --> 8606.428 | Lr --> 0.006 | Seconds_per_step --> 4.970 | [2024-08-09 15:43:39,070][Main][INFO] - [train] Step 5450 out of 80000 | Loss --> 3.968 | Grad_l2 --> 1.219 | Weights_l2 --> 8607.147 | Lr --> 0.006 | Seconds_per_step --> 4.602 | [2024-08-09 15:47:34,049][Main][INFO] - [train] Step 5500 out of 80000 | Loss --> 3.903 | Grad_l2 --> 1.203 | Weights_l2 --> 8607.924 | Lr --> 0.006 | Seconds_per_step --> 4.700 | [2024-08-09 15:51:38,499][Main][INFO] - [train] Step 5550 out of 80000 | Loss --> 3.855 | Grad_l2 --> 1.167 | Weights_l2 --> 8608.720 | Lr --> 0.006 | Seconds_per_step --> 4.889 | [2024-08-09 15:55:46,120][Main][INFO] - [train] Step 5600 out of 80000 | Loss --> 3.815 | Grad_l2 --> 1.111 | Weights_l2 --> 8609.615 | Lr --> 0.006 | Seconds_per_step --> 4.952 | [2024-08-09 15:59:40,828][Main][INFO] - [train] Step 5650 out of 80000 | Loss --> 3.768 | Grad_l2 --> 1.066 | Weights_l2 --> 8610.530 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 16:03:38,938][Main][INFO] - [train] Step 5700 out of 80000 | Loss --> 3.711 | Grad_l2 --> 1.048 | Weights_l2 --> 8611.436 | Lr --> 0.006 | Seconds_per_step --> 4.762 | [2024-08-09 16:07:49,871][Main][INFO] - [train] Step 5750 out of 80000 | Loss --> 3.675 | Grad_l2 --> 0.998 | Weights_l2 --> 8612.404 | Lr --> 0.006 | Seconds_per_step --> 5.019 | [2024-08-09 16:11:53,420][Main][INFO] - [train] Step 5800 out of 80000 | Loss --> 3.625 | Grad_l2 --> 0.993 | Weights_l2 --> 8613.329 | Lr --> 0.006 | Seconds_per_step --> 4.871 | [2024-08-09 16:15:50,534][Main][INFO] - [train] Step 5850 out of 80000 | Loss --> 3.580 | Grad_l2 --> 0.952 | Weights_l2 --> 8614.289 | Lr --> 0.006 | Seconds_per_step --> 4.742 | [2024-08-09 16:19:45,983][Main][INFO] - [train] Step 5900 out of 80000 | Loss --> 3.545 | Grad_l2 --> 1.014 | Weights_l2 --> 8615.197 | Lr --> 0.006 | Seconds_per_step --> 4.709 | [2024-08-09 16:23:51,342][Main][INFO] - [train] Step 5950 out of 80000 | Loss --> 3.522 | Grad_l2 --> 0.927 | Weights_l2 --> 8616.137 | Lr --> 0.006 | Seconds_per_step --> 4.907 | [2024-08-09 16:27:42,121][Main][INFO] - [train] Step 6000 out of 80000 | Loss --> 3.483 | Grad_l2 --> 0.926 | Weights_l2 --> 8617.066 | Lr --> 0.006 | Seconds_per_step --> 4.616 | [2024-08-09 16:31:41,278][Main][INFO] - [train] Step 6050 out of 80000 | Loss --> 3.455 | Grad_l2 --> 0.886 | Weights_l2 --> 8617.977 | Lr --> 0.006 | Seconds_per_step --> 4.783 | [2024-08-09 16:35:47,786][Main][INFO] - [train] Step 6100 out of 80000 | Loss --> 3.428 | Grad_l2 --> 0.956 | Weights_l2 --> 8618.840 | Lr --> 0.006 | Seconds_per_step --> 4.930 | [2024-08-09 16:39:45,096][Main][INFO] - [train] Step 6150 out of 80000 | Loss --> 3.399 | Grad_l2 --> 0.832 | Weights_l2 --> 8619.684 | Lr --> 0.006 | Seconds_per_step --> 4.746 | [2024-08-09 16:43:41,554][Main][INFO] - [train] Step 6200 out of 80000 | Loss --> 3.377 | Grad_l2 --> 0.868 | Weights_l2 --> 8620.530 | Lr --> 0.006 | Seconds_per_step --> 4.729 | [2024-08-09 16:47:45,442][Main][INFO] - [train] Step 6250 out of 80000 | Loss --> 3.363 | Grad_l2 --> 0.850 | Weights_l2 --> 8621.325 | Lr --> 0.006 | Seconds_per_step --> 4.878 | [2024-08-09 16:51:50,312][Main][INFO] - [train] Step 6300 out of 80000 | Loss --> 3.332 | Grad_l2 --> 0.840 | Weights_l2 --> 8622.117 | Lr --> 0.007 | Seconds_per_step --> 4.897 | [2024-08-09 16:55:47,619][Main][INFO] - [train] Step 6350 out of 80000 | Loss --> 3.311 | Grad_l2 --> 0.875 | Weights_l2 --> 8622.932 | Lr --> 0.007 | Seconds_per_step --> 4.746 | [2024-08-09 16:59:44,744][Main][INFO] - [train] Step 6400 out of 80000 | Loss --> 3.289 | Grad_l2 --> 0.808 | Weights_l2 --> 8623.729 | Lr --> 0.007 | Seconds_per_step --> 4.742 | [2024-08-09 17:03:47,092][Main][INFO] - [train] Step 6450 out of 80000 | Loss --> 3.279 | Grad_l2 --> 0.782 | Weights_l2 --> 8624.498 | Lr --> 0.007 | Seconds_per_step --> 4.847 | [2024-08-09 17:07:51,580][Main][INFO] - [train] Step 6500 out of 80000 | Loss --> 3.250 | Grad_l2 --> 0.812 | Weights_l2 --> 8625.266 | Lr --> 0.007 | Seconds_per_step --> 4.890 | [2024-08-09 17:11:44,444][Main][INFO] - [train] Step 6550 out of 80000 | Loss --> 3.248 | Grad_l2 --> 0.806 | Weights_l2 --> 8626.024 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 17:15:43,498][Main][INFO] - [train] Step 6600 out of 80000 | Loss --> 3.216 | Grad_l2 --> 0.765 | Weights_l2 --> 8626.794 | Lr --> 0.007 | Seconds_per_step --> 4.781 | [2024-08-09 17:19:50,311][Main][INFO] - [train] Step 6650 out of 80000 | Loss --> 3.209 | Grad_l2 --> 0.793 | Weights_l2 --> 8627.521 | Lr --> 0.007 | Seconds_per_step --> 4.936 | [2024-08-09 17:23:54,093][Main][INFO] - [train] Step 6700 out of 80000 | Loss --> 3.200 | Grad_l2 --> 0.788 | Weights_l2 --> 8628.294 | Lr --> 0.007 | Seconds_per_step --> 4.876 | [2024-08-09 17:27:47,402][Main][INFO] - [train] Step 6750 out of 80000 | Loss --> 3.176 | Grad_l2 --> 0.762 | Weights_l2 --> 8629.053 | Lr --> 0.007 | Seconds_per_step --> 4.666 | [2024-08-09 17:31:49,523][Main][INFO] - [train] Step 6800 out of 80000 | Loss --> 3.170 | Grad_l2 --> 0.778 | Weights_l2 --> 8629.825 | Lr --> 0.007 | Seconds_per_step --> 4.842 | [2024-08-09 17:35:52,826][Main][INFO] - [train] Step 6850 out of 80000 | Loss --> 3.159 | Grad_l2 --> 0.775 | Weights_l2 --> 8630.568 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-09 17:39:46,125][Main][INFO] - [train] Step 6900 out of 80000 | Loss --> 3.158 | Grad_l2 --> 0.757 | Weights_l2 --> 8631.325 | Lr --> 0.007 | Seconds_per_step --> 4.666 | [2024-08-09 17:43:39,817][Main][INFO] - [train] Step 6950 out of 80000 | Loss --> 3.138 | Grad_l2 --> 0.766 | Weights_l2 --> 8632.055 | Lr --> 0.007 | Seconds_per_step --> 4.674 | [2024-08-09 17:47:44,929][Main][INFO] - [train] Step 7000 out of 80000 | Loss --> 3.123 | Grad_l2 --> 0.759 | Weights_l2 --> 8632.805 | Lr --> 0.007 | Seconds_per_step --> 4.902 | [2024-08-09 17:51:43,866][Main][INFO] - [train] Step 7050 out of 80000 | Loss --> 3.118 | Grad_l2 --> 0.752 | Weights_l2 --> 8633.540 | Lr --> 0.007 | Seconds_per_step --> 4.779 | [2024-08-09 17:55:42,820][Main][INFO] - [train] Step 7100 out of 80000 | Loss --> 3.103 | Grad_l2 --> 0.757 | Weights_l2 --> 8634.285 | Lr --> 0.007 | Seconds_per_step --> 4.779 | [2024-08-09 17:59:44,322][Main][INFO] - [train] Step 7150 out of 80000 | Loss --> 3.083 | Grad_l2 --> 0.755 | Weights_l2 --> 8635.030 | Lr --> 0.007 | Seconds_per_step --> 4.830 | [2024-08-09 18:03:44,919][Main][INFO] - [train] Step 7200 out of 80000 | Loss --> 3.073 | Grad_l2 --> 0.735 | Weights_l2 --> 8635.760 | Lr --> 0.007 | Seconds_per_step --> 4.812 | [2024-08-09 18:07:37,774][Main][INFO] - [train] Step 7250 out of 80000 | Loss --> 3.055 | Grad_l2 --> 0.718 | Weights_l2 --> 8636.493 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 18:11:34,198][Main][INFO] - [train] Step 7300 out of 80000 | Loss --> 3.051 | Grad_l2 --> 0.721 | Weights_l2 --> 8637.245 | Lr --> 0.007 | Seconds_per_step --> 4.728 | [2024-08-09 18:15:38,927][Main][INFO] - [train] Step 7350 out of 80000 | Loss --> 3.041 | Grad_l2 --> 0.762 | Weights_l2 --> 8637.991 | Lr --> 0.007 | Seconds_per_step --> 4.895 | [2024-08-09 18:19:42,181][Main][INFO] - [train] Step 7400 out of 80000 | Loss --> 3.031 | Grad_l2 --> 0.720 | Weights_l2 --> 8638.728 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-09 18:23:37,911][Main][INFO] - [train] Step 7450 out of 80000 | Loss --> 3.033 | Grad_l2 --> 0.718 | Weights_l2 --> 8639.471 | Lr --> 0.007 | Seconds_per_step --> 4.715 | [2024-08-09 18:27:38,146][Main][INFO] - [train] Step 7500 out of 80000 | Loss --> 3.020 | Grad_l2 --> 0.729 | Weights_l2 --> 8640.206 | Lr --> 0.007 | Seconds_per_step --> 4.805 | [2024-08-09 18:31:39,590][Main][INFO] - [train] Step 7550 out of 80000 | Loss --> 3.004 | Grad_l2 --> 0.734 | Weights_l2 --> 8640.967 | Lr --> 0.007 | Seconds_per_step --> 4.829 | [2024-08-09 18:35:32,805][Main][INFO] - [train] Step 7600 out of 80000 | Loss --> 2.986 | Grad_l2 --> 0.714 | Weights_l2 --> 8641.711 | Lr --> 0.007 | Seconds_per_step --> 4.664 | [2024-08-09 18:39:28,080][Main][INFO] - [train] Step 7650 out of 80000 | Loss --> 2.994 | Grad_l2 --> 0.743 | Weights_l2 --> 8642.483 | Lr --> 0.007 | Seconds_per_step --> 4.705 | [2024-08-09 18:43:37,815][Main][INFO] - [train] Step 7700 out of 80000 | Loss --> 2.980 | Grad_l2 --> 0.699 | Weights_l2 --> 8643.242 | Lr --> 0.007 | Seconds_per_step --> 4.995 | [2024-08-09 18:47:42,799][Main][INFO] - [train] Step 7750 out of 80000 | Loss --> 2.976 | Grad_l2 --> 0.725 | Weights_l2 --> 8643.993 | Lr --> 0.007 | Seconds_per_step --> 4.900 | [2024-08-09 18:51:34,464][Main][INFO] - [train] Step 7800 out of 80000 | Loss --> 2.963 | Grad_l2 --> 0.699 | Weights_l2 --> 8644.781 | Lr --> 0.007 | Seconds_per_step --> 4.633 | [2024-08-09 18:55:32,534][Main][INFO] - [train] Step 7850 out of 80000 | Loss --> 2.954 | Grad_l2 --> 0.706 | Weights_l2 --> 8645.547 | Lr --> 0.007 | Seconds_per_step --> 4.761 | [2024-08-09 18:59:39,507][Main][INFO] - [train] Step 7900 out of 80000 | Loss --> 2.947 | Grad_l2 --> 0.689 | Weights_l2 --> 8646.333 | Lr --> 0.007 | Seconds_per_step --> 4.939 | [2024-08-09 19:03:32,747][Main][INFO] - [train] Step 7950 out of 80000 | Loss --> 2.935 | Grad_l2 --> 0.701 | Weights_l2 --> 8647.099 | Lr --> 0.007 | Seconds_per_step --> 4.665 | [2024-08-09 19:07:42,994][Main][INFO] - [train] Step 8000 out of 80000 | Loss --> 2.940 | Grad_l2 --> 0.709 | Weights_l2 --> 8647.889 | Lr --> 0.007 | Seconds_per_step --> 5.005 | [2024-08-09 19:11:49,930][Main][INFO] - [train] Step 8050 out of 80000 | Loss --> 2.919 | Grad_l2 --> 0.699 | Weights_l2 --> 8648.663 | Lr --> 0.007 | Seconds_per_step --> 4.939 | [2024-08-09 19:16:03,022][Main][INFO] - [train] Step 8100 out of 80000 | Loss --> 2.916 | Grad_l2 --> 0.690 | Weights_l2 --> 8649.453 | Lr --> 0.007 | Seconds_per_step --> 5.062 | [2024-08-09 19:20:05,203][Main][INFO] - [train] Step 8150 out of 80000 | Loss --> 2.914 | Grad_l2 --> 0.712 | Weights_l2 --> 8650.238 | Lr --> 0.007 | Seconds_per_step --> 4.844 | [2024-08-09 19:23:57,007][Main][INFO] - [train] Step 8200 out of 80000 | Loss --> 2.903 | Grad_l2 --> 0.727 | Weights_l2 --> 8651.038 | Lr --> 0.007 | Seconds_per_step --> 4.636 | [2024-08-09 19:28:02,052][Main][INFO] - [train] Step 8250 out of 80000 | Loss --> 2.896 | Grad_l2 --> 0.691 | Weights_l2 --> 8651.842 | Lr --> 0.007 | Seconds_per_step --> 4.901 | [2024-08-09 19:32:01,708][Main][INFO] - [train] Step 8300 out of 80000 | Loss --> 2.889 | Grad_l2 --> 0.703 | Weights_l2 --> 8652.661 | Lr --> 0.007 | Seconds_per_step --> 4.793 | [2024-08-09 19:35:54,542][Main][INFO] - [train] Step 8350 out of 80000 | Loss --> 2.882 | Grad_l2 --> 0.672 | Weights_l2 --> 8653.459 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 19:39:53,565][Main][INFO] - [train] Step 8400 out of 80000 | Loss --> 2.861 | Grad_l2 --> 0.676 | Weights_l2 --> 8654.299 | Lr --> 0.007 | Seconds_per_step --> 4.780 | [2024-08-09 19:43:54,929][Main][INFO] - [train] Step 8450 out of 80000 | Loss --> 2.870 | Grad_l2 --> 0.680 | Weights_l2 --> 8655.106 | Lr --> 0.007 | Seconds_per_step --> 4.827 | [2024-08-09 19:47:46,390][Main][INFO] - [train] Step 8500 out of 80000 | Loss --> 2.857 | Grad_l2 --> 0.673 | Weights_l2 --> 8655.929 | Lr --> 0.007 | Seconds_per_step --> 4.629 | [2024-08-09 19:51:41,774][Main][INFO] - [train] Step 8550 out of 80000 | Loss --> 2.847 | Grad_l2 --> 0.674 | Weights_l2 --> 8656.760 | Lr --> 0.007 | Seconds_per_step --> 4.708 | [2024-08-09 19:55:50,508][Main][INFO] - [train] Step 8600 out of 80000 | Loss --> 2.838 | Grad_l2 --> 0.679 | Weights_l2 --> 8657.613 | Lr --> 0.007 | Seconds_per_step --> 4.975 | [2024-08-09 19:59:55,899][Main][INFO] - [train] Step 8650 out of 80000 | Loss --> 2.847 | Grad_l2 --> 0.668 | Weights_l2 --> 8658.480 | Lr --> 0.007 | Seconds_per_step --> 4.908 | [2024-08-09 20:03:46,940][Main][INFO] - [train] Step 8700 out of 80000 | Loss --> 2.834 | Grad_l2 --> 0.689 | Weights_l2 --> 8659.322 | Lr --> 0.007 | Seconds_per_step --> 4.621 | [2024-08-09 20:07:40,599][Main][INFO] - [train] Step 8750 out of 80000 | Loss --> 2.814 | Grad_l2 --> 0.665 | Weights_l2 --> 8660.208 | Lr --> 0.007 | Seconds_per_step --> 4.673 | [2024-08-09 20:11:45,521][Main][INFO] - [train] Step 8800 out of 80000 | Loss --> 2.817 | Grad_l2 --> 0.645 | Weights_l2 --> 8661.057 | Lr --> 0.008 | Seconds_per_step --> 4.898 | [2024-08-09 20:15:34,178][Main][INFO] - [train] Step 8850 out of 80000 | Loss --> 2.807 | Grad_l2 --> 0.662 | Weights_l2 --> 8661.931 | Lr --> 0.008 | Seconds_per_step --> 4.573 | [2024-08-09 20:19:09,957][Main][INFO] - [train] Step 8900 out of 80000 | Loss --> 2.806 | Grad_l2 --> 0.671 | Weights_l2 --> 8662.810 | Lr --> 0.008 | Seconds_per_step --> 4.316 | [2024-08-09 20:22:37,497][Main][INFO] - [train] Step 8950 out of 80000 | Loss --> 2.799 | Grad_l2 --> 0.656 | Weights_l2 --> 8663.699 | Lr --> 0.008 | Seconds_per_step --> 4.151 | [2024-08-09 20:26:01,302][Main][INFO] - [train] Step 9000 out of 80000 | Loss --> 2.796 | Grad_l2 --> 0.657 | Weights_l2 --> 8664.591 | Lr --> 0.008 | Seconds_per_step --> 4.076 | [2024-08-09 20:29:28,057][Main][INFO] - [train] Step 9050 out of 80000 | Loss --> 2.787 | Grad_l2 --> 0.650 | Weights_l2 --> 8665.480 | Lr --> 0.008 | Seconds_per_step --> 4.135 | [2024-08-09 20:32:55,736][Main][INFO] - [train] Step 9100 out of 80000 | Loss --> 2.771 | Grad_l2 --> 0.668 | Weights_l2 --> 8666.372 | Lr --> 0.008 | Seconds_per_step --> 4.154 | [2024-08-09 20:36:26,470][Main][INFO] - [train] Step 9150 out of 80000 | Loss --> 2.762 | Grad_l2 --> 0.630 | Weights_l2 --> 8667.256 | Lr --> 0.008 | Seconds_per_step --> 4.215 | [2024-08-09 20:40:02,302][Main][INFO] - [train] Step 9200 out of 80000 | Loss --> 2.764 | Grad_l2 --> 0.668 | Weights_l2 --> 8668.181 | Lr --> 0.008 | Seconds_per_step --> 4.317 | [2024-08-09 20:43:38,319][Main][INFO] - [train] Step 9250 out of 80000 | Loss --> 2.760 | Grad_l2 --> 0.658 | Weights_l2 --> 8669.118 | Lr --> 0.008 | Seconds_per_step --> 4.320 | [2024-08-09 20:47:12,593][Main][INFO] - [train] Step 9300 out of 80000 | Loss --> 2.754 | Grad_l2 --> 0.631 | Weights_l2 --> 8670.046 | Lr --> 0.008 | Seconds_per_step --> 4.285 | [2024-08-09 20:50:50,547][Main][INFO] - [train] Step 9350 out of 80000 | Loss --> 2.748 | Grad_l2 --> 0.659 | Weights_l2 --> 8670.961 | Lr --> 0.008 | Seconds_per_step --> 4.359 | [2024-08-09 20:54:27,164][Main][INFO] - [train] Step 9400 out of 80000 | Loss --> 2.745 | Grad_l2 --> 0.645 | Weights_l2 --> 8671.908 | Lr --> 0.008 | Seconds_per_step --> 4.332 | [2024-08-09 20:57:57,318][Main][INFO] - [train] Step 9450 out of 80000 | Loss --> 2.734 | Grad_l2 --> 0.651 | Weights_l2 --> 8672.837 | Lr --> 0.008 | Seconds_per_step --> 4.203 | [2024-08-09 21:01:27,114][Main][INFO] - [train] Step 9500 out of 80000 | Loss --> 2.724 | Grad_l2 --> 0.651 | Weights_l2 --> 8673.783 | Lr --> 0.008 | Seconds_per_step --> 4.196 | [2024-08-09 21:05:01,540][Main][INFO] - [train] Step 9550 out of 80000 | Loss --> 2.723 | Grad_l2 --> 0.635 | Weights_l2 --> 8674.757 | Lr --> 0.008 | Seconds_per_step --> 4.289 | [2024-08-09 21:08:31,178][Main][INFO] - [train] Step 9600 out of 80000 | Loss --> 2.707 | Grad_l2 --> 0.633 | Weights_l2 --> 8675.741 | Lr --> 0.008 | Seconds_per_step --> 4.193 | [2024-08-09 21:12:04,549][Main][INFO] - [train] Step 9650 out of 80000 | Loss --> 2.705 | Grad_l2 --> 0.662 | Weights_l2 --> 8676.698 | Lr --> 0.008 | Seconds_per_step --> 4.267 | [2024-08-09 21:15:31,359][Main][INFO] - [train] Step 9700 out of 80000 | Loss --> 2.701 | Grad_l2 --> 0.620 | Weights_l2 --> 8677.665 | Lr --> 0.008 | Seconds_per_step --> 4.136 | [2024-08-09 21:19:05,681][Main][INFO] - [train] Step 9750 out of 80000 | Loss --> 2.696 | Grad_l2 --> 0.635 | Weights_l2 --> 8678.669 | Lr --> 0.008 | Seconds_per_step --> 4.286 | [2024-08-09 21:22:39,126][Main][INFO] - [train] Step 9800 out of 80000 | Loss --> 2.698 | Grad_l2 --> 0.652 | Weights_l2 --> 8679.660 | Lr --> 0.008 | Seconds_per_step --> 4.269 | [2024-08-09 21:26:12,926][Main][INFO] - [train] Step 9850 out of 80000 | Loss --> 2.691 | Grad_l2 --> 0.629 | Weights_l2 --> 8680.657 | Lr --> 0.008 | Seconds_per_step --> 4.276 | [2024-08-09 21:29:43,650][Main][INFO] - [train] Step 9900 out of 80000 | Loss --> 2.683 | Grad_l2 --> 0.639 | Weights_l2 --> 8681.671 | Lr --> 0.008 | Seconds_per_step --> 4.214 | [2024-08-09 21:33:15,612][Main][INFO] - [train] Step 9950 out of 80000 | Loss --> 2.678 | Grad_l2 --> 0.624 | Weights_l2 --> 8682.710 | Lr --> 0.008 | Seconds_per_step --> 4.239 | [2024-08-09 21:36:48,784][Main][INFO] - [train] Step 10000 out of 80000 | Loss --> 2.683 | Grad_l2 --> 0.631 | Weights_l2 --> 8683.746 | Lr --> 0.008 | Seconds_per_step --> 4.263 | [2024-08-09 21:36:48,785][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 [2024-08-09 21:36:48,789][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-09 21:36:50,921][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin [2024-08-09 21:36:54,147][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin [2024-08-09 21:36:54,147][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl [2024-08-09 21:40:24,314][Main][INFO] - [train] Step 10050 out of 80000 | Loss --> 2.672 | Grad_l2 --> 0.620 | Weights_l2 --> 8684.763 | Lr --> 0.008 | Seconds_per_step --> 4.311 | [2024-08-09 21:43:54,934][Main][INFO] - [train] Step 10100 out of 80000 | Loss --> 2.668 | Grad_l2 --> 0.630 | Weights_l2 --> 8685.788 | Lr --> 0.008 | Seconds_per_step --> 4.212 | [2024-08-09 21:47:26,893][Main][INFO] - [train] Step 10150 out of 80000 | Loss --> 2.664 | Grad_l2 --> 0.622 | Weights_l2 --> 8686.819 | Lr --> 0.008 | Seconds_per_step --> 4.239 | [2024-08-09 21:50:55,047][Main][INFO] - [train] Step 10200 out of 80000 | Loss --> 2.647 | Grad_l2 --> 0.609 | Weights_l2 --> 8687.859 | Lr --> 0.008 | Seconds_per_step --> 4.163 | [2024-08-09 21:54:22,462][Main][INFO] - [train] Step 10250 out of 80000 | Loss --> 2.655 | Grad_l2 --> 0.613 | Weights_l2 --> 8688.883 | Lr --> 0.008 | Seconds_per_step --> 4.148 | [2024-08-09 21:57:52,835][Main][INFO] - [train] Step 10300 out of 80000 | Loss --> 2.637 | Grad_l2 --> 0.623 | Weights_l2 --> 8689.917 | Lr --> 0.008 | Seconds_per_step --> 4.207 | [2024-08-09 22:01:30,833][Main][INFO] - [train] Step 10350 out of 80000 | Loss --> 2.650 | Grad_l2 --> 0.636 | Weights_l2 --> 8690.965 | Lr --> 0.008 | Seconds_per_step --> 4.360 | [2024-08-09 22:04:59,449][Main][INFO] - [train] Step 10400 out of 80000 | Loss --> 2.630 | Grad_l2 --> 0.619 | Weights_l2 --> 8691.976 | Lr --> 0.008 | Seconds_per_step --> 4.172 | [2024-08-09 22:08:29,303][Main][INFO] - [train] Step 10450 out of 80000 | Loss --> 2.617 | Grad_l2 --> 0.615 | Weights_l2 --> 8693.000 | Lr --> 0.008 | Seconds_per_step --> 4.197 | [2024-08-09 22:12:03,306][Main][INFO] - [train] Step 10500 out of 80000 | Loss --> 2.627 | Grad_l2 --> 0.615 | Weights_l2 --> 8694.037 | Lr --> 0.008 | Seconds_per_step --> 4.280 | [2024-08-09 22:15:37,789][Main][INFO] - [train] Step 10550 out of 80000 | Loss --> 2.612 | Grad_l2 --> 0.594 | Weights_l2 --> 8695.071 | Lr --> 0.008 | Seconds_per_step --> 4.290 | [2024-08-09 22:19:13,830][Main][INFO] - [train] Step 10600 out of 80000 | Loss --> 2.599 | Grad_l2 --> 0.608 | Weights_l2 --> 8696.095 | Lr --> 0.008 | Seconds_per_step --> 4.321 | [2024-08-09 22:22:47,537][Main][INFO] - [train] Step 10650 out of 80000 | Loss --> 2.598 | Grad_l2 --> 0.619 | Weights_l2 --> 8697.144 | Lr --> 0.008 | Seconds_per_step --> 4.274 | [2024-08-09 22:26:27,089][Main][INFO] - [train] Step 10700 out of 80000 | Loss --> 2.602 | Grad_l2 --> 0.627 | Weights_l2 --> 8698.176 | Lr --> 0.008 | Seconds_per_step --> 4.391 | [2024-08-09 22:30:08,291][Main][INFO] - [train] Step 10750 out of 80000 | Loss --> 2.598 | Grad_l2 --> 0.603 | Weights_l2 --> 8699.195 | Lr --> 0.008 | Seconds_per_step --> 4.424 | [2024-08-09 22:33:50,515][Main][INFO] - [train] Step 10800 out of 80000 | Loss --> 2.600 | Grad_l2 --> 0.615 | Weights_l2 --> 8700.255 | Lr --> 0.008 | Seconds_per_step --> 4.444 | [2024-08-09 22:37:23,733][Main][INFO] - [train] Step 10850 out of 80000 | Loss --> 2.588 | Grad_l2 --> 0.604 | Weights_l2 --> 8701.311 | Lr --> 0.008 | Seconds_per_step --> 4.264 | [2024-08-09 22:40:59,607][Main][INFO] - [train] Step 10900 out of 80000 | Loss --> 2.585 | Grad_l2 --> 0.605 | Weights_l2 --> 8702.327 | Lr --> 0.008 | Seconds_per_step --> 4.317 | [2024-08-09 22:44:33,158][Main][INFO] - [train] Step 10950 out of 80000 | Loss --> 2.581 | Grad_l2 --> 0.595 | Weights_l2 --> 8703.360 | Lr --> 0.008 | Seconds_per_step --> 4.271 | [2024-08-09 22:48:11,589][Main][INFO] - [train] Step 11000 out of 80000 | Loss --> 2.580 | Grad_l2 --> 0.601 | Weights_l2 --> 8704.410 | Lr --> 0.008 | Seconds_per_step --> 4.369 | [2024-08-09 22:51:45,840][Main][INFO] - [train] Step 11050 out of 80000 | Loss --> 2.578 | Grad_l2 --> 0.587 | Weights_l2 --> 8705.448 | Lr --> 0.008 | Seconds_per_step --> 4.285 | [2024-08-09 22:55:25,388][Main][INFO] - [train] Step 11100 out of 80000 | Loss --> 2.574 | Grad_l2 --> 0.599 | Weights_l2 --> 8706.475 | Lr --> 0.008 | Seconds_per_step --> 4.391 | [2024-08-09 22:58:56,339][Main][INFO] - [train] Step 11150 out of 80000 | Loss --> 2.574 | Grad_l2 --> 0.599 | Weights_l2 --> 8707.487 | Lr --> 0.008 | Seconds_per_step --> 4.219 | [2024-08-09 23:02:28,434][Main][INFO] - [train] Step 11200 out of 80000 | Loss --> 2.577 | Grad_l2 --> 0.600 | Weights_l2 --> 8708.529 | Lr --> 0.008 | Seconds_per_step --> 4.242 | [2024-08-09 23:06:01,747][Main][INFO] - [train] Step 11250 out of 80000 | Loss --> 2.563 | Grad_l2 --> 0.582 | Weights_l2 --> 8709.582 | Lr --> 0.008 | Seconds_per_step --> 4.266 | [2024-08-09 23:09:36,821][Main][INFO] - [train] Step 11300 out of 80000 | Loss --> 2.567 | Grad_l2 --> 0.559 | Weights_l2 --> 8710.620 | Lr --> 0.008 | Seconds_per_step --> 4.301 | [2024-08-09 23:13:05,158][Main][INFO] - [train] Step 11350 out of 80000 | Loss --> 2.561 | Grad_l2 --> 0.598 | Weights_l2 --> 8711.669 | Lr --> 0.008 | Seconds_per_step --> 4.167 | [2024-08-09 23:16:34,505][Main][INFO] - [train] Step 11400 out of 80000 | Loss --> 2.555 | Grad_l2 --> 0.588 | Weights_l2 --> 8712.697 | Lr --> 0.008 | Seconds_per_step --> 4.187 | [2024-08-09 23:20:05,626][Main][INFO] - [train] Step 11450 out of 80000 | Loss --> 2.546 | Grad_l2 --> 0.582 | Weights_l2 --> 8713.753 | Lr --> 0.008 | Seconds_per_step --> 4.222 | [2024-08-09 23:23:40,137][Main][INFO] - [train] Step 11500 out of 80000 | Loss --> 2.549 | Grad_l2 --> 0.583 | Weights_l2 --> 8714.804 | Lr --> 0.008 | Seconds_per_step --> 4.290 | [2024-08-09 23:27:11,574][Main][INFO] - [train] Step 11550 out of 80000 | Loss --> 2.536 | Grad_l2 --> 0.582 | Weights_l2 --> 8715.826 | Lr --> 0.008 | Seconds_per_step --> 4.229 | [2024-08-09 23:30:49,636][Main][INFO] - [train] Step 11600 out of 80000 | Loss --> 2.538 | Grad_l2 --> 0.576 | Weights_l2 --> 8716.881 | Lr --> 0.008 | Seconds_per_step --> 4.361 | [2024-08-09 23:34:19,586][Main][INFO] - [train] Step 11650 out of 80000 | Loss --> 2.539 | Grad_l2 --> 0.580 | Weights_l2 --> 8717.926 | Lr --> 0.008 | Seconds_per_step --> 4.199 | [2024-08-09 23:37:49,139][Main][INFO] - [train] Step 11700 out of 80000 | Loss --> 2.524 | Grad_l2 --> 0.585 | Weights_l2 --> 8718.968 | Lr --> 0.008 | Seconds_per_step --> 4.191 | [2024-08-09 23:41:15,748][Main][INFO] - [train] Step 11750 out of 80000 | Loss --> 2.531 | Grad_l2 --> 0.601 | Weights_l2 --> 8720.024 | Lr --> 0.008 | Seconds_per_step --> 4.132 | [2024-08-09 23:44:46,392][Main][INFO] - [train] Step 11800 out of 80000 | Loss --> 2.519 | Grad_l2 --> 0.586 | Weights_l2 --> 8721.064 | Lr --> 0.008 | Seconds_per_step --> 4.213 | [2024-08-09 23:48:21,044][Main][INFO] - [train] Step 11850 out of 80000 | Loss --> 2.516 | Grad_l2 --> 0.576 | Weights_l2 --> 8722.098 | Lr --> 0.008 | Seconds_per_step --> 4.293 | [2024-08-09 23:51:45,233][Main][INFO] - [train] Step 11900 out of 80000 | Loss --> 2.509 | Grad_l2 --> 0.566 | Weights_l2 --> 8723.110 | Lr --> 0.008 | Seconds_per_step --> 4.084 | [2024-08-09 23:55:18,373][Main][INFO] - [train] Step 11950 out of 80000 | Loss --> 2.508 | Grad_l2 --> 0.605 | Weights_l2 --> 8724.151 | Lr --> 0.008 | Seconds_per_step --> 4.263 | [2024-08-09 23:58:51,710][Main][INFO] - [train] Step 12000 out of 80000 | Loss --> 2.510 | Grad_l2 --> 0.587 | Weights_l2 --> 8725.199 | Lr --> 0.008 | Seconds_per_step --> 4.267 | [2024-08-10 00:02:27,062][Main][INFO] - [train] Step 12050 out of 80000 | Loss --> 2.502 | Grad_l2 --> 0.573 | Weights_l2 --> 8726.242 | Lr --> 0.008 | Seconds_per_step --> 4.307 | [2024-08-10 00:05:54,127][Main][INFO] - [train] Step 12100 out of 80000 | Loss --> 2.496 | Grad_l2 --> 0.583 | Weights_l2 --> 8727.250 | Lr --> 0.008 | Seconds_per_step --> 4.141 | [2024-08-10 00:09:20,349][Main][INFO] - [train] Step 12150 out of 80000 | Loss --> 2.499 | Grad_l2 --> 0.553 | Weights_l2 --> 8728.275 | Lr --> 0.008 | Seconds_per_step --> 4.124 | [2024-08-10 00:12:28,941][Main][INFO] - [train] Step 12200 out of 80000 | Loss --> 2.503 | Grad_l2 --> 0.561 | Weights_l2 --> 8729.279 | Lr --> 0.008 | Seconds_per_step --> 3.772 | [2024-08-10 00:15:19,261][Main][INFO] - [train] Step 12250 out of 80000 | Loss --> 2.494 | Grad_l2 --> 0.590 | Weights_l2 --> 8730.313 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 00:18:09,129][Main][INFO] - [train] Step 12300 out of 80000 | Loss --> 2.490 | Grad_l2 --> 0.552 | Weights_l2 --> 8731.341 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 00:20:58,085][Main][INFO] - [train] Step 12350 out of 80000 | Loss --> 2.487 | Grad_l2 --> 0.548 | Weights_l2 --> 8732.401 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 00:23:47,642][Main][INFO] - [train] Step 12400 out of 80000 | Loss --> 2.480 | Grad_l2 --> 0.542 | Weights_l2 --> 8733.439 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 00:26:37,898][Main][INFO] - [train] Step 12450 out of 80000 | Loss --> 2.481 | Grad_l2 --> 0.551 | Weights_l2 --> 8734.469 | Lr --> 0.008 | Seconds_per_step --> 3.405 | [2024-08-10 00:29:27,451][Main][INFO] - [train] Step 12500 out of 80000 | Loss --> 2.477 | Grad_l2 --> 0.558 | Weights_l2 --> 8735.510 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 00:32:17,116][Main][INFO] - [train] Step 12550 out of 80000 | Loss --> 2.478 | Grad_l2 --> 0.549 | Weights_l2 --> 8736.541 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 00:35:06,730][Main][INFO] - [train] Step 12600 out of 80000 | Loss --> 2.470 | Grad_l2 --> 0.545 | Weights_l2 --> 8737.575 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 00:37:58,202][Main][INFO] - [train] Step 12650 out of 80000 | Loss --> 2.471 | Grad_l2 --> 0.547 | Weights_l2 --> 8738.595 | Lr --> 0.008 | Seconds_per_step --> 3.429 | [2024-08-10 00:40:47,794][Main][INFO] - [train] Step 12700 out of 80000 | Loss --> 2.462 | Grad_l2 --> 0.528 | Weights_l2 --> 8739.622 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 00:43:37,447][Main][INFO] - [train] Step 12750 out of 80000 | Loss --> 2.457 | Grad_l2 --> 0.533 | Weights_l2 --> 8740.657 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 00:46:25,836][Main][INFO] - [train] Step 12800 out of 80000 | Loss --> 2.461 | Grad_l2 --> 0.549 | Weights_l2 --> 8741.689 | Lr --> 0.008 | Seconds_per_step --> 3.368 | [2024-08-10 00:49:16,460][Main][INFO] - [train] Step 12850 out of 80000 | Loss --> 2.451 | Grad_l2 --> 0.531 | Weights_l2 --> 8742.743 | Lr --> 0.008 | Seconds_per_step --> 3.412 | [2024-08-10 00:52:06,465][Main][INFO] - [train] Step 12900 out of 80000 | Loss --> 2.453 | Grad_l2 --> 0.527 | Weights_l2 --> 8743.761 | Lr --> 0.008 | Seconds_per_step --> 3.400 | [2024-08-10 00:54:55,879][Main][INFO] - [train] Step 12950 out of 80000 | Loss --> 2.447 | Grad_l2 --> 0.520 | Weights_l2 --> 8744.791 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 00:57:44,034][Main][INFO] - [train] Step 13000 out of 80000 | Loss --> 2.448 | Grad_l2 --> 0.539 | Weights_l2 --> 8745.805 | Lr --> 0.008 | Seconds_per_step --> 3.363 | [2024-08-10 01:00:33,641][Main][INFO] - [train] Step 13050 out of 80000 | Loss --> 2.439 | Grad_l2 --> 0.511 | Weights_l2 --> 8746.858 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 01:03:22,747][Main][INFO] - [train] Step 13100 out of 80000 | Loss --> 2.436 | Grad_l2 --> 0.524 | Weights_l2 --> 8747.888 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 01:06:11,723][Main][INFO] - [train] Step 13150 out of 80000 | Loss --> 2.438 | Grad_l2 --> 0.525 | Weights_l2 --> 8748.918 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 01:09:00,218][Main][INFO] - [train] Step 13200 out of 80000 | Loss --> 2.436 | Grad_l2 --> 0.519 | Weights_l2 --> 8749.966 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 01:11:49,462][Main][INFO] - [train] Step 13250 out of 80000 | Loss --> 2.437 | Grad_l2 --> 0.511 | Weights_l2 --> 8751.004 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 01:14:38,139][Main][INFO] - [train] Step 13300 out of 80000 | Loss --> 2.428 | Grad_l2 --> 0.512 | Weights_l2 --> 8752.040 | Lr --> 0.008 | Seconds_per_step --> 3.374 | [2024-08-10 01:17:26,878][Main][INFO] - [train] Step 13350 out of 80000 | Loss --> 2.426 | Grad_l2 --> 0.514 | Weights_l2 --> 8753.062 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 01:20:15,034][Main][INFO] - [train] Step 13400 out of 80000 | Loss --> 2.429 | Grad_l2 --> 0.509 | Weights_l2 --> 8754.109 | Lr --> 0.008 | Seconds_per_step --> 3.363 | [2024-08-10 01:23:03,363][Main][INFO] - [train] Step 13450 out of 80000 | Loss --> 2.423 | Grad_l2 --> 0.511 | Weights_l2 --> 8755.150 | Lr --> 0.008 | Seconds_per_step --> 3.367 | [2024-08-10 01:25:52,469][Main][INFO] - [train] Step 13500 out of 80000 | Loss --> 2.413 | Grad_l2 --> 0.502 | Weights_l2 --> 8756.209 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 01:28:40,740][Main][INFO] - [train] Step 13550 out of 80000 | Loss --> 2.422 | Grad_l2 --> 0.504 | Weights_l2 --> 8757.222 | Lr --> 0.008 | Seconds_per_step --> 3.365 | [2024-08-10 01:31:29,023][Main][INFO] - [train] Step 13600 out of 80000 | Loss --> 2.415 | Grad_l2 --> 0.495 | Weights_l2 --> 8758.279 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 01:34:18,913][Main][INFO] - [train] Step 13650 out of 80000 | Loss --> 2.420 | Grad_l2 --> 0.505 | Weights_l2 --> 8759.320 | Lr --> 0.008 | Seconds_per_step --> 3.398 | [2024-08-10 01:37:08,507][Main][INFO] - [train] Step 13700 out of 80000 | Loss --> 2.417 | Grad_l2 --> 0.500 | Weights_l2 --> 8760.334 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 01:39:56,547][Main][INFO] - [train] Step 13750 out of 80000 | Loss --> 2.406 | Grad_l2 --> 0.495 | Weights_l2 --> 8761.384 | Lr --> 0.008 | Seconds_per_step --> 3.361 | [2024-08-10 01:42:44,437][Main][INFO] - [train] Step 13800 out of 80000 | Loss --> 2.404 | Grad_l2 --> 0.501 | Weights_l2 --> 8762.410 | Lr --> 0.008 | Seconds_per_step --> 3.358 | [2024-08-10 01:45:33,358][Main][INFO] - [train] Step 13850 out of 80000 | Loss --> 2.397 | Grad_l2 --> 0.502 | Weights_l2 --> 8763.443 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 01:48:22,144][Main][INFO] - [train] Step 13900 out of 80000 | Loss --> 2.389 | Grad_l2 --> 0.492 | Weights_l2 --> 8764.465 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 01:51:09,910][Main][INFO] - [train] Step 13950 out of 80000 | Loss --> 2.391 | Grad_l2 --> 0.502 | Weights_l2 --> 8765.511 | Lr --> 0.008 | Seconds_per_step --> 3.355 | [2024-08-10 01:53:58,677][Main][INFO] - [train] Step 14000 out of 80000 | Loss --> 2.388 | Grad_l2 --> 0.497 | Weights_l2 --> 8766.530 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 01:56:48,499][Main][INFO] - [train] Step 14050 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.491 | Weights_l2 --> 8767.573 | Lr --> 0.008 | Seconds_per_step --> 3.396 | [2024-08-10 01:59:38,047][Main][INFO] - [train] Step 14100 out of 80000 | Loss --> 2.377 | Grad_l2 --> 0.503 | Weights_l2 --> 8768.588 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 02:02:27,734][Main][INFO] - [train] Step 14150 out of 80000 | Loss --> 2.378 | Grad_l2 --> 0.488 | Weights_l2 --> 8769.605 | Lr --> 0.008 | Seconds_per_step --> 3.394 | [2024-08-10 02:05:16,770][Main][INFO] - [train] Step 14200 out of 80000 | Loss --> 2.368 | Grad_l2 --> 0.496 | Weights_l2 --> 8770.616 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 02:08:05,603][Main][INFO] - [train] Step 14250 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.488 | Weights_l2 --> 8771.662 | Lr --> 0.008 | Seconds_per_step --> 3.377 | [2024-08-10 02:10:54,613][Main][INFO] - [train] Step 14300 out of 80000 | Loss --> 2.381 | Grad_l2 --> 0.490 | Weights_l2 --> 8772.676 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 02:13:44,831][Main][INFO] - [train] Step 14350 out of 80000 | Loss --> 2.371 | Grad_l2 --> 0.483 | Weights_l2 --> 8773.704 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 02:16:33,316][Main][INFO] - [train] Step 14400 out of 80000 | Loss --> 2.377 | Grad_l2 --> 0.487 | Weights_l2 --> 8774.735 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 02:19:21,805][Main][INFO] - [train] Step 14450 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.482 | Weights_l2 --> 8775.731 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 02:22:11,796][Main][INFO] - [train] Step 14500 out of 80000 | Loss --> 2.369 | Grad_l2 --> 0.499 | Weights_l2 --> 8776.744 | Lr --> 0.008 | Seconds_per_step --> 3.400 | [2024-08-10 02:25:01,398][Main][INFO] - [train] Step 14550 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.485 | Weights_l2 --> 8777.791 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 02:27:51,146][Main][INFO] - [train] Step 14600 out of 80000 | Loss --> 2.369 | Grad_l2 --> 0.481 | Weights_l2 --> 8778.816 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 02:30:40,279][Main][INFO] - [train] Step 14650 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.486 | Weights_l2 --> 8779.856 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:33:30,596][Main][INFO] - [train] Step 14700 out of 80000 | Loss --> 2.368 | Grad_l2 --> 0.488 | Weights_l2 --> 8780.880 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 02:36:19,985][Main][INFO] - [train] Step 14750 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.480 | Weights_l2 --> 8781.909 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 02:39:09,113][Main][INFO] - [train] Step 14800 out of 80000 | Loss --> 2.355 | Grad_l2 --> 0.490 | Weights_l2 --> 8782.954 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:41:58,266][Main][INFO] - [train] Step 14850 out of 80000 | Loss --> 2.363 | Grad_l2 --> 0.486 | Weights_l2 --> 8783.980 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:44:47,795][Main][INFO] - [train] Step 14900 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.479 | Weights_l2 --> 8784.994 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 02:47:37,307][Main][INFO] - [train] Step 14950 out of 80000 | Loss --> 2.362 | Grad_l2 --> 0.477 | Weights_l2 --> 8786.006 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 02:50:26,779][Main][INFO] - [train] Step 15000 out of 80000 | Loss --> 2.358 | Grad_l2 --> 0.483 | Weights_l2 --> 8787.026 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 02:50:26,780][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-15000 [2024-08-10 02:50:26,783][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 02:50:28,800][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-15000/model.safetensors [2024-08-10 02:50:31,549][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-15000/optimizer.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-15000/scheduler.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-15000/sampler.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-15000/sampler_1.bin [2024-08-10 02:50:31,551][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-15000/random_states_0.pkl [2024-08-10 02:53:20,359][Main][INFO] - [train] Step 15050 out of 80000 | Loss --> 2.367 | Grad_l2 --> 0.472 | Weights_l2 --> 8788.057 | Lr --> 0.008 | Seconds_per_step --> 3.472 | [2024-08-10 02:56:09,018][Main][INFO] - [train] Step 15100 out of 80000 | Loss --> 2.358 | Grad_l2 --> 0.482 | Weights_l2 --> 8789.093 | Lr --> 0.008 | Seconds_per_step --> 3.373 | [2024-08-10 02:58:59,528][Main][INFO] - [train] Step 15150 out of 80000 | Loss --> 2.357 | Grad_l2 --> 0.474 | Weights_l2 --> 8790.099 | Lr --> 0.008 | Seconds_per_step --> 3.410 | [2024-08-10 03:01:49,372][Main][INFO] - [train] Step 15200 out of 80000 | Loss --> 2.361 | Grad_l2 --> 0.472 | Weights_l2 --> 8791.113 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 03:04:37,075][Main][INFO] - [train] Step 15250 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.478 | Weights_l2 --> 8792.148 | Lr --> 0.008 | Seconds_per_step --> 3.354 | [2024-08-10 03:07:24,386][Main][INFO] - [train] Step 15300 out of 80000 | Loss --> 2.356 | Grad_l2 --> 0.479 | Weights_l2 --> 8793.148 | Lr --> 0.008 | Seconds_per_step --> 3.346 | [2024-08-10 03:10:13,428][Main][INFO] - [train] Step 15350 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.479 | Weights_l2 --> 8794.155 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 03:13:02,393][Main][INFO] - [train] Step 15400 out of 80000 | Loss --> 2.347 | Grad_l2 --> 0.469 | Weights_l2 --> 8795.175 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 03:15:51,829][Main][INFO] - [train] Step 15450 out of 80000 | Loss --> 2.347 | Grad_l2 --> 0.461 | Weights_l2 --> 8796.188 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 03:18:41,239][Main][INFO] - [train] Step 15500 out of 80000 | Loss --> 2.349 | Grad_l2 --> 0.468 | Weights_l2 --> 8797.217 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 03:21:30,852][Main][INFO] - [train] Step 15550 out of 80000 | Loss --> 2.341 | Grad_l2 --> 0.466 | Weights_l2 --> 8798.212 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 03:24:19,122][Main][INFO] - [train] Step 15600 out of 80000 | Loss --> 2.345 | Grad_l2 --> 0.472 | Weights_l2 --> 8799.202 | Lr --> 0.008 | Seconds_per_step --> 3.365 | [2024-08-10 03:27:08,990][Main][INFO] - [train] Step 15650 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.470 | Weights_l2 --> 8800.214 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 03:29:58,136][Main][INFO] - [train] Step 15700 out of 80000 | Loss --> 2.338 | Grad_l2 --> 0.473 | Weights_l2 --> 8801.228 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 03:32:47,841][Main][INFO] - [train] Step 15750 out of 80000 | Loss --> 2.335 | Grad_l2 --> 0.456 | Weights_l2 --> 8802.245 | Lr --> 0.008 | Seconds_per_step --> 3.394 | [2024-08-10 03:35:36,029][Main][INFO] - [train] Step 15800 out of 80000 | Loss --> 2.332 | Grad_l2 --> 0.454 | Weights_l2 --> 8803.247 | Lr --> 0.008 | Seconds_per_step --> 3.364 | [2024-08-10 03:38:25,696][Main][INFO] - [train] Step 15850 out of 80000 | Loss --> 2.329 | Grad_l2 --> 0.468 | Weights_l2 --> 8804.255 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 03:41:14,705][Main][INFO] - [train] Step 15900 out of 80000 | Loss --> 2.344 | Grad_l2 --> 0.771 | Weights_l2 --> 8805.210 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:44:05,016][Main][INFO] - [train] Step 15950 out of 80000 | Loss --> 2.336 | Grad_l2 --> 0.468 | Weights_l2 --> 8806.198 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 03:46:54,039][Main][INFO] - [train] Step 16000 out of 80000 | Loss --> 2.322 | Grad_l2 --> 0.466 | Weights_l2 --> 8807.208 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:49:43,020][Main][INFO] - [train] Step 16050 out of 80000 | Loss --> 2.327 | Grad_l2 --> 0.461 | Weights_l2 --> 8808.179 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:52:31,751][Main][INFO] - [train] Step 16100 out of 80000 | Loss --> 2.335 | Grad_l2 --> 0.464 | Weights_l2 --> 8809.180 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 03:55:21,300][Main][INFO] - [train] Step 16150 out of 80000 | Loss --> 2.332 | Grad_l2 --> 0.459 | Weights_l2 --> 8810.175 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 03:58:10,233][Main][INFO] - [train] Step 16200 out of 80000 | Loss --> 2.330 | Grad_l2 --> 0.459 | Weights_l2 --> 8811.153 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 04:00:58,805][Main][INFO] - [train] Step 16250 out of 80000 | Loss --> 2.328 | Grad_l2 --> 0.452 | Weights_l2 --> 8812.148 | Lr --> 0.008 | Seconds_per_step --> 3.371 | [2024-08-10 04:03:47,753][Main][INFO] - [train] Step 16300 out of 80000 | Loss --> 2.327 | Grad_l2 --> 0.457 | Weights_l2 --> 8813.155 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 04:06:37,968][Main][INFO] - [train] Step 16350 out of 80000 | Loss --> 2.315 | Grad_l2 --> 0.457 | Weights_l2 --> 8814.120 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 04:09:27,345][Main][INFO] - [train] Step 16400 out of 80000 | Loss --> 2.323 | Grad_l2 --> 0.451 | Weights_l2 --> 8815.101 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 04:12:16,639][Main][INFO] - [train] Step 16450 out of 80000 | Loss --> 2.323 | Grad_l2 --> 0.454 | Weights_l2 --> 8816.109 | Lr --> 0.008 | Seconds_per_step --> 3.386 | [2024-08-10 04:15:06,000][Main][INFO] - [train] Step 16500 out of 80000 | Loss --> 2.316 | Grad_l2 --> 0.461 | Weights_l2 --> 8817.094 | Lr --> 0.008 | Seconds_per_step --> 3.387 | [2024-08-10 04:18:00,644][Main][INFO] - [train] Step 16550 out of 80000 | Loss --> 2.316 | Grad_l2 --> 0.454 | Weights_l2 --> 8818.060 | Lr --> 0.008 | Seconds_per_step --> 3.493 | [2024-08-10 04:20:52,878][Main][INFO] - [train] Step 16600 out of 80000 | Loss --> 2.325 | Grad_l2 --> 0.447 | Weights_l2 --> 8819.026 | Lr --> 0.008 | Seconds_per_step --> 3.445 | [2024-08-10 04:23:41,401][Main][INFO] - [train] Step 16650 out of 80000 | Loss --> 2.310 | Grad_l2 --> 0.456 | Weights_l2 --> 8820.003 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 04:26:30,469][Main][INFO] - [train] Step 16700 out of 80000 | Loss --> 2.312 | Grad_l2 --> 0.451 | Weights_l2 --> 8821.005 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 04:29:19,628][Main][INFO] - [train] Step 16750 out of 80000 | Loss --> 2.324 | Grad_l2 --> 0.451 | Weights_l2 --> 8821.988 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 04:32:09,203][Main][INFO] - [train] Step 16800 out of 80000 | Loss --> 2.308 | Grad_l2 --> 0.450 | Weights_l2 --> 8822.952 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 04:35:05,100][Main][INFO] - [train] Step 16850 out of 80000 | Loss --> 2.294 | Grad_l2 --> 0.446 | Weights_l2 --> 8823.904 | Lr --> 0.008 | Seconds_per_step --> 3.518 | [2024-08-10 04:37:58,342][Main][INFO] - [train] Step 16900 out of 80000 | Loss --> 2.310 | Grad_l2 --> 0.454 | Weights_l2 --> 8824.866 | Lr --> 0.008 | Seconds_per_step --> 3.465 | [2024-08-10 04:40:55,063][Main][INFO] - [train] Step 16950 out of 80000 | Loss --> 2.294 | Grad_l2 --> 0.449 | Weights_l2 --> 8825.837 | Lr --> 0.008 | Seconds_per_step --> 3.534 | [2024-08-10 04:44:25,073][Main][INFO] - [train] Step 17000 out of 80000 | Loss --> 2.298 | Grad_l2 --> 0.448 | Weights_l2 --> 8826.792 | Lr --> 0.008 | Seconds_per_step --> 4.200 | [2024-08-10 04:47:13,391][Main][INFO] - [train] Step 17050 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.441 | Weights_l2 --> 8827.769 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 04:50:17,595][Main][INFO] - [train] Step 17100 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.439 | Weights_l2 --> 8828.744 | Lr --> 0.008 | Seconds_per_step --> 3.684 | [2024-08-10 04:53:21,981][Main][INFO] - [train] Step 17150 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.443 | Weights_l2 --> 8829.696 | Lr --> 0.008 | Seconds_per_step --> 3.688 | [2024-08-10 04:56:15,559][Main][INFO] - [train] Step 17200 out of 80000 | Loss --> 2.301 | Grad_l2 --> 0.447 | Weights_l2 --> 8830.652 | Lr --> 0.008 | Seconds_per_step --> 3.472 | [2024-08-10 04:59:19,644][Main][INFO] - [train] Step 17250 out of 80000 | Loss --> 2.299 | Grad_l2 --> 0.441 | Weights_l2 --> 8831.603 | Lr --> 0.008 | Seconds_per_step --> 3.682 | [2024-08-10 05:03:08,540][Main][INFO] - [train] Step 17300 out of 80000 | Loss --> 2.298 | Grad_l2 --> 0.441 | Weights_l2 --> 8832.566 | Lr --> 0.008 | Seconds_per_step --> 4.578 | [2024-08-10 05:06:12,612][Main][INFO] - [train] Step 17350 out of 80000 | Loss --> 2.292 | Grad_l2 --> 0.442 | Weights_l2 --> 8833.511 | Lr --> 0.008 | Seconds_per_step --> 3.681 | [2024-08-10 05:09:09,744][Main][INFO] - [train] Step 17400 out of 80000 | Loss --> 2.295 | Grad_l2 --> 0.436 | Weights_l2 --> 8834.479 | Lr --> 0.008 | Seconds_per_step --> 3.543 | [2024-08-10 05:12:04,488][Main][INFO] - [train] Step 17450 out of 80000 | Loss --> 2.292 | Grad_l2 --> 0.441 | Weights_l2 --> 8835.436 | Lr --> 0.008 | Seconds_per_step --> 3.495 | [2024-08-10 05:15:05,017][Main][INFO] - [train] Step 17500 out of 80000 | Loss --> 2.293 | Grad_l2 --> 0.446 | Weights_l2 --> 8836.425 | Lr --> 0.008 | Seconds_per_step --> 3.611 | [2024-08-10 05:17:58,854][Main][INFO] - [train] Step 17550 out of 80000 | Loss --> 2.286 | Grad_l2 --> 0.437 | Weights_l2 --> 8837.398 | Lr --> 0.008 | Seconds_per_step --> 3.477 | [2024-08-10 05:21:01,251][Main][INFO] - [train] Step 17600 out of 80000 | Loss --> 2.293 | Grad_l2 --> 0.438 | Weights_l2 --> 8838.359 | Lr --> 0.008 | Seconds_per_step --> 3.648 | [2024-08-10 05:23:50,306][Main][INFO] - [train] Step 17650 out of 80000 | Loss --> 2.290 | Grad_l2 --> 0.440 | Weights_l2 --> 8839.301 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 05:26:39,934][Main][INFO] - [train] Step 17700 out of 80000 | Loss --> 2.279 | Grad_l2 --> 0.437 | Weights_l2 --> 8840.279 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 05:29:31,132][Main][INFO] - [train] Step 17750 out of 80000 | Loss --> 2.295 | Grad_l2 --> 0.435 | Weights_l2 --> 8841.235 | Lr --> 0.008 | Seconds_per_step --> 3.424 | [2024-08-10 05:32:28,592][Main][INFO] - [train] Step 17800 out of 80000 | Loss --> 2.285 | Grad_l2 --> 0.439 | Weights_l2 --> 8842.177 | Lr --> 0.008 | Seconds_per_step --> 3.549 | [2024-08-10 05:35:29,530][Main][INFO] - [train] Step 17850 out of 80000 | Loss --> 2.278 | Grad_l2 --> 0.438 | Weights_l2 --> 8843.147 | Lr --> 0.008 | Seconds_per_step --> 3.619 | [2024-08-10 05:38:26,746][Main][INFO] - [train] Step 17900 out of 80000 | Loss --> 2.280 | Grad_l2 --> 0.433 | Weights_l2 --> 8844.071 | Lr --> 0.008 | Seconds_per_step --> 3.544 | [2024-08-10 05:41:16,386][Main][INFO] - [train] Step 17950 out of 80000 | Loss --> 2.279 | Grad_l2 --> 0.429 | Weights_l2 --> 8845.032 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 05:44:05,564][Main][INFO] - [train] Step 18000 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.439 | Weights_l2 --> 8845.972 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 05:46:54,634][Main][INFO] - [train] Step 18050 out of 80000 | Loss --> 2.272 | Grad_l2 --> 0.434 | Weights_l2 --> 8846.896 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 05:49:43,407][Main][INFO] - [train] Step 18100 out of 80000 | Loss --> 2.269 | Grad_l2 --> 0.430 | Weights_l2 --> 8847.847 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 05:52:32,975][Main][INFO] - [train] Step 18150 out of 80000 | Loss --> 2.268 | Grad_l2 --> 0.433 | Weights_l2 --> 8848.785 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 05:55:22,235][Main][INFO] - [train] Step 18200 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.428 | Weights_l2 --> 8849.744 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 05:58:12,437][Main][INFO] - [train] Step 18250 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.438 | Weights_l2 --> 8850.687 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 06:01:01,584][Main][INFO] - [train] Step 18300 out of 80000 | Loss --> 2.272 | Grad_l2 --> 0.437 | Weights_l2 --> 8851.617 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 06:03:50,792][Main][INFO] - [train] Step 18350 out of 80000 | Loss --> 2.262 | Grad_l2 --> 0.425 | Weights_l2 --> 8852.557 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 06:06:40,061][Main][INFO] - [train] Step 18400 out of 80000 | Loss --> 2.265 | Grad_l2 --> 0.427 | Weights_l2 --> 8853.478 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 06:09:28,691][Main][INFO] - [train] Step 18450 out of 80000 | Loss --> 2.250 | Grad_l2 --> 0.427 | Weights_l2 --> 8854.413 | Lr --> 0.008 | Seconds_per_step --> 3.373 | [2024-08-10 06:12:17,730][Main][INFO] - [train] Step 18500 out of 80000 | Loss --> 2.258 | Grad_l2 --> 0.432 | Weights_l2 --> 8855.354 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 06:15:06,731][Main][INFO] - [train] Step 18550 out of 80000 | Loss --> 2.264 | Grad_l2 --> 0.428 | Weights_l2 --> 8856.260 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 06:17:55,946][Main][INFO] - [train] Step 18600 out of 80000 | Loss --> 2.257 | Grad_l2 --> 0.429 | Weights_l2 --> 8857.174 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 06:20:45,707][Main][INFO] - [train] Step 18650 out of 80000 | Loss --> 2.251 | Grad_l2 --> 0.427 | Weights_l2 --> 8858.085 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 06:23:34,836][Main][INFO] - [train] Step 18700 out of 80000 | Loss --> 2.262 | Grad_l2 --> 0.427 | Weights_l2 --> 8859.009 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 06:26:23,643][Main][INFO] - [train] Step 18750 out of 80000 | Loss --> 2.253 | Grad_l2 --> 0.421 | Weights_l2 --> 8859.915 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 06:29:11,354][Main][INFO] - [train] Step 18800 out of 80000 | Loss --> 2.250 | Grad_l2 --> 0.424 | Weights_l2 --> 8860.852 | Lr --> 0.008 | Seconds_per_step --> 3.354 | [2024-08-10 06:32:00,884][Main][INFO] - [train] Step 18850 out of 80000 | Loss --> 2.242 | Grad_l2 --> 0.422 | Weights_l2 --> 8861.763 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 06:34:48,915][Main][INFO] - [train] Step 18900 out of 80000 | Loss --> 2.253 | Grad_l2 --> 0.419 | Weights_l2 --> 8862.654 | Lr --> 0.008 | Seconds_per_step --> 3.361 | [2024-08-10 06:37:37,887][Main][INFO] - [train] Step 18950 out of 80000 | Loss --> 2.241 | Grad_l2 --> 0.420 | Weights_l2 --> 8863.549 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 06:40:26,873][Main][INFO] - [train] Step 19000 out of 80000 | Loss --> 2.256 | Grad_l2 --> 0.419 | Weights_l2 --> 8864.466 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 06:43:16,993][Main][INFO] - [train] Step 19050 out of 80000 | Loss --> 2.244 | Grad_l2 --> 0.415 | Weights_l2 --> 8865.372 | Lr --> 0.008 | Seconds_per_step --> 3.402 | [2024-08-10 06:46:06,020][Main][INFO] - [train] Step 19100 out of 80000 | Loss --> 2.242 | Grad_l2 --> 0.415 | Weights_l2 --> 8866.273 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 06:48:54,927][Main][INFO] - [train] Step 19150 out of 80000 | Loss --> 2.241 | Grad_l2 --> 0.419 | Weights_l2 --> 8867.179 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 06:51:43,818][Main][INFO] - [train] Step 19200 out of 80000 | Loss --> 2.252 | Grad_l2 --> 0.418 | Weights_l2 --> 8868.077 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 06:54:33,553][Main][INFO] - [train] Step 19250 out of 80000 | Loss --> 2.245 | Grad_l2 --> 0.416 | Weights_l2 --> 8868.959 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 06:57:22,776][Main][INFO] - [train] Step 19300 out of 80000 | Loss --> 2.239 | Grad_l2 --> 0.419 | Weights_l2 --> 8869.832 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 07:00:11,595][Main][INFO] - [train] Step 19350 out of 80000 | Loss --> 2.237 | Grad_l2 --> 0.414 | Weights_l2 --> 8870.736 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:03:00,088][Main][INFO] - [train] Step 19400 out of 80000 | Loss --> 2.224 | Grad_l2 --> 0.416 | Weights_l2 --> 8871.632 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 07:05:49,818][Main][INFO] - [train] Step 19450 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.416 | Weights_l2 --> 8872.502 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 07:08:38,161][Main][INFO] - [train] Step 19500 out of 80000 | Loss --> 2.229 | Grad_l2 --> 0.412 | Weights_l2 --> 8873.395 | Lr --> 0.008 | Seconds_per_step --> 3.367 | [2024-08-10 07:11:26,283][Main][INFO] - [train] Step 19550 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.418 | Weights_l2 --> 8874.282 | Lr --> 0.008 | Seconds_per_step --> 3.362 | [2024-08-10 07:14:14,560][Main][INFO] - [train] Step 19600 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.409 | Weights_l2 --> 8875.190 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 07:17:03,585][Main][INFO] - [train] Step 19650 out of 80000 | Loss --> 2.224 | Grad_l2 --> 0.408 | Weights_l2 --> 8876.067 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 07:19:52,539][Main][INFO] - [train] Step 19700 out of 80000 | Loss --> 2.223 | Grad_l2 --> 0.411 | Weights_l2 --> 8876.951 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 07:22:41,624][Main][INFO] - [train] Step 19750 out of 80000 | Loss --> 2.204 | Grad_l2 --> 0.407 | Weights_l2 --> 8877.825 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 07:25:30,616][Main][INFO] - [train] Step 19800 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.412 | Weights_l2 --> 8878.734 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 07:28:19,413][Main][INFO] - [train] Step 19850 out of 80000 | Loss --> 2.219 | Grad_l2 --> 0.407 | Weights_l2 --> 8879.608 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:31:08,190][Main][INFO] - [train] Step 19900 out of 80000 | Loss --> 2.221 | Grad_l2 --> 0.405 | Weights_l2 --> 8880.499 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:33:58,992][Main][INFO] - [train] Step 19950 out of 80000 | Loss --> 2.209 | Grad_l2 --> 0.411 | Weights_l2 --> 8881.370 | Lr --> 0.008 | Seconds_per_step --> 3.416 | [2024-08-10 07:36:48,090][Main][INFO] - [train] Step 20000 out of 80000 | Loss --> 2.216 | Grad_l2 --> 0.408 | Weights_l2 --> 8882.234 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 07:36:48,091][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 [2024-08-10 07:36:48,094][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 07:36:50,075][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors [2024-08-10 07:36:52,974][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin [2024-08-10 07:36:52,974][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl [2024-08-10 07:39:42,190][Main][INFO] - [train] Step 20050 out of 80000 | Loss --> 2.200 | Grad_l2 --> 0.407 | Weights_l2 --> 8883.086 | Lr --> 0.008 | Seconds_per_step --> 3.482 | [2024-08-10 07:42:32,658][Main][INFO] - [train] Step 20100 out of 80000 | Loss --> 2.198 | Grad_l2 --> 0.404 | Weights_l2 --> 8883.945 | Lr --> 0.008 | Seconds_per_step --> 3.409 | [2024-08-10 07:45:21,533][Main][INFO] - [train] Step 20150 out of 80000 | Loss --> 2.202 | Grad_l2 --> 0.408 | Weights_l2 --> 8884.806 | Lr --> 0.008 | Seconds_per_step --> 3.377 | [2024-08-10 07:48:10,447][Main][INFO] - [train] Step 20200 out of 80000 | Loss --> 2.203 | Grad_l2 --> 0.407 | Weights_l2 --> 8885.699 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 07:50:58,905][Main][INFO] - [train] Step 20250 out of 80000 | Loss --> 2.196 | Grad_l2 --> 0.404 | Weights_l2 --> 8886.567 | Lr --> 0.008 | Seconds_per_step --> 3.369 | [2024-08-10 07:53:48,181][Main][INFO] - [train] Step 20300 out of 80000 | Loss --> 2.197 | Grad_l2 --> 0.406 | Weights_l2 --> 8887.444 | Lr --> 0.008 | Seconds_per_step --> 3.386 | [2024-08-10 07:56:36,986][Main][INFO] - [train] Step 20350 out of 80000 | Loss --> 2.196 | Grad_l2 --> 0.403 | Weights_l2 --> 8888.293 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:59:25,941][Main][INFO] - [train] Step 20400 out of 80000 | Loss --> 2.193 | Grad_l2 --> 0.406 | Weights_l2 --> 8889.139 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 08:02:15,456][Main][INFO] - [train] Step 20450 out of 80000 | Loss --> 2.192 | Grad_l2 --> 0.407 | Weights_l2 --> 8889.993 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 08:05:04,967][Main][INFO] - [train] Step 20500 out of 80000 | Loss --> 2.198 | Grad_l2 --> 0.399 | Weights_l2 --> 8890.854 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 08:07:54,096][Main][INFO] - [train] Step 20550 out of 80000 | Loss --> 2.189 | Grad_l2 --> 0.403 | Weights_l2 --> 8891.715 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 08:10:43,683][Main][INFO] - [train] Step 20600 out of 80000 | Loss --> 2.193 | Grad_l2 --> 0.398 | Weights_l2 --> 8892.575 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 08:13:31,893][Main][INFO] - [train] Step 20650 out of 80000 | Loss --> 2.186 | Grad_l2 --> 0.403 | Weights_l2 --> 8893.421 | Lr --> 0.008 | Seconds_per_step --> 3.364 | [2024-08-10 08:16:21,983][Main][INFO] - [train] Step 20700 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.399 | Weights_l2 --> 8894.287 | Lr --> 0.008 | Seconds_per_step --> 3.402 | [2024-08-10 08:19:11,096][Main][INFO] - [train] Step 20750 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.397 | Weights_l2 --> 8895.130 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 08:21:59,564][Main][INFO] - [train] Step 20800 out of 80000 | Loss --> 2.176 | Grad_l2 --> 0.404 | Weights_l2 --> 8895.973 | Lr --> 0.008 | Seconds_per_step --> 3.369 | [2024-08-10 08:24:48,772][Main][INFO] - [train] Step 20850 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.399 | Weights_l2 --> 8896.827 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 08:27:39,040][Main][INFO] - [train] Step 20900 out of 80000 | Loss --> 2.181 | Grad_l2 --> 0.398 | Weights_l2 --> 8897.692 | Lr --> 0.008 | Seconds_per_step --> 3.405 | [2024-08-10 08:30:28,088][Main][INFO] - [train] Step 20950 out of 80000 | Loss --> 2.169 | Grad_l2 --> 0.396 | Weights_l2 --> 8898.537 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 08:33:15,709][Main][INFO] - [train] Step 21000 out of 80000 | Loss --> 2.174 | Grad_l2 --> 0.403 | Weights_l2 --> 8899.403 | Lr --> 0.008 | Seconds_per_step --> 3.352 | [2024-08-10 08:36:05,184][Main][INFO] - [train] Step 21050 out of 80000 | Loss --> 2.173 | Grad_l2 --> 0.395 | Weights_l2 --> 8900.227 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 08:38:54,935][Main][INFO] - [train] Step 21100 out of 80000 | Loss --> 2.164 | Grad_l2 --> 0.394 | Weights_l2 --> 8901.058 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 08:41:44,183][Main][INFO] - [train] Step 21150 out of 80000 | Loss --> 2.172 | Grad_l2 --> 0.396 | Weights_l2 --> 8901.899 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 08:44:34,115][Main][INFO] - [train] Step 21200 out of 80000 | Loss --> 2.164 | Grad_l2 --> 0.399 | Weights_l2 --> 8902.719 | Lr --> 0.008 | Seconds_per_step --> 3.399 | [2024-08-10 08:47:23,707][Main][INFO] - [train] Step 21250 out of 80000 | Loss --> 2.162 | Grad_l2 --> 0.399 | Weights_l2 --> 8903.545 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 08:50:13,259][Main][INFO] - [train] Step 21300 out of 80000 | Loss --> 2.152 | Grad_l2 --> 0.394 | Weights_l2 --> 8904.388 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 08:53:03,350][Main][INFO] - [train] Step 21350 out of 80000 | Loss --> 2.161 | Grad_l2 --> 0.397 | Weights_l2 --> 8905.227 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 08:55:51,861][Main][INFO] - [train] Step 21400 out of 80000 | Loss --> 2.153 | Grad_l2 --> 0.396 | Weights_l2 --> 8906.073 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 08:58:41,307][Main][INFO] - [train] Step 21450 out of 80000 | Loss --> 2.151 | Grad_l2 --> 0.390 | Weights_l2 --> 8906.885 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:01:31,004][Main][INFO] - [train] Step 21500 out of 80000 | Loss --> 2.152 | Grad_l2 --> 0.390 | Weights_l2 --> 8907.704 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 09:04:20,626][Main][INFO] - [train] Step 21550 out of 80000 | Loss --> 2.140 | Grad_l2 --> 0.392 | Weights_l2 --> 8908.519 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 09:07:09,329][Main][INFO] - [train] Step 21600 out of 80000 | Loss --> 2.142 | Grad_l2 --> 0.392 | Weights_l2 --> 8909.337 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 09:09:58,896][Main][INFO] - [train] Step 21650 out of 80000 | Loss --> 2.142 | Grad_l2 --> 0.396 | Weights_l2 --> 8910.161 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 09:12:47,641][Main][INFO] - [train] Step 21700 out of 80000 | Loss --> 2.138 | Grad_l2 --> 0.393 | Weights_l2 --> 8910.985 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 09:15:36,715][Main][INFO] - [train] Step 21750 out of 80000 | Loss --> 2.137 | Grad_l2 --> 0.389 | Weights_l2 --> 8911.803 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 09:18:25,804][Main][INFO] - [train] Step 21800 out of 80000 | Loss --> 2.124 | Grad_l2 --> 0.388 | Weights_l2 --> 8912.593 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 09:21:14,976][Main][INFO] - [train] Step 21850 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.388 | Weights_l2 --> 8913.424 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 09:24:04,906][Main][INFO] - [train] Step 21900 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.391 | Weights_l2 --> 8914.230 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 09:26:55,122][Main][INFO] - [train] Step 21950 out of 80000 | Loss --> 2.129 | Grad_l2 --> 0.389 | Weights_l2 --> 8915.052 | Lr --> 0.007 | Seconds_per_step --> 3.404 | [2024-08-10 09:29:44,540][Main][INFO] - [train] Step 22000 out of 80000 | Loss --> 2.125 | Grad_l2 --> 0.389 | Weights_l2 --> 8915.853 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 09:32:34,046][Main][INFO] - [train] Step 22050 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.395 | Weights_l2 --> 8916.661 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 09:35:21,952][Main][INFO] - [train] Step 22100 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.385 | Weights_l2 --> 8917.495 | Lr --> 0.007 | Seconds_per_step --> 3.358 | [2024-08-10 09:38:11,003][Main][INFO] - [train] Step 22150 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.391 | Weights_l2 --> 8918.291 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 09:40:58,413][Main][INFO] - [train] Step 22200 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.387 | Weights_l2 --> 8919.096 | Lr --> 0.007 | Seconds_per_step --> 3.348 | [2024-08-10 09:43:47,159][Main][INFO] - [train] Step 22250 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.384 | Weights_l2 --> 8919.898 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 09:46:36,610][Main][INFO] - [train] Step 22300 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.388 | Weights_l2 --> 8920.716 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:49:27,851][Main][INFO] - [train] Step 22350 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.384 | Weights_l2 --> 8921.523 | Lr --> 0.007 | Seconds_per_step --> 3.425 | [2024-08-10 09:52:17,105][Main][INFO] - [train] Step 22400 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.388 | Weights_l2 --> 8922.301 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 09:55:06,547][Main][INFO] - [train] Step 22450 out of 80000 | Loss --> 2.123 | Grad_l2 --> 0.384 | Weights_l2 --> 8923.100 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:57:55,448][Main][INFO] - [train] Step 22500 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.386 | Weights_l2 --> 8923.925 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 10:00:44,643][Main][INFO] - [train] Step 22550 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.383 | Weights_l2 --> 8924.712 | Lr --> 0.007 | Seconds_per_step --> 3.384 | [2024-08-10 10:03:33,602][Main][INFO] - [train] Step 22600 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.386 | Weights_l2 --> 8925.491 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 10:06:22,538][Main][INFO] - [train] Step 22650 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.384 | Weights_l2 --> 8926.277 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 10:09:11,569][Main][INFO] - [train] Step 22700 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.385 | Weights_l2 --> 8927.058 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 10:12:00,266][Main][INFO] - [train] Step 22750 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.386 | Weights_l2 --> 8927.846 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 10:14:49,150][Main][INFO] - [train] Step 22800 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.382 | Weights_l2 --> 8928.630 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 10:17:37,676][Main][INFO] - [train] Step 22850 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.383 | Weights_l2 --> 8929.421 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 10:20:27,046][Main][INFO] - [train] Step 22900 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.380 | Weights_l2 --> 8930.220 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 10:23:16,675][Main][INFO] - [train] Step 22950 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.383 | Weights_l2 --> 8931.007 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 10:26:05,843][Main][INFO] - [train] Step 23000 out of 80000 | Loss --> 2.120 | Grad_l2 --> 0.381 | Weights_l2 --> 8931.786 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 10:28:54,942][Main][INFO] - [train] Step 23050 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.386 | Weights_l2 --> 8932.580 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 10:31:44,834][Main][INFO] - [train] Step 23100 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.380 | Weights_l2 --> 8933.380 | Lr --> 0.007 | Seconds_per_step --> 3.398 | [2024-08-10 10:34:34,786][Main][INFO] - [train] Step 23150 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.377 | Weights_l2 --> 8934.167 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 10:37:24,223][Main][INFO] - [train] Step 23200 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.377 | Weights_l2 --> 8934.945 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 10:40:13,606][Main][INFO] - [train] Step 23250 out of 80000 | Loss --> 2.109 | Grad_l2 --> 0.381 | Weights_l2 --> 8935.739 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 10:43:03,034][Main][INFO] - [train] Step 23300 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.380 | Weights_l2 --> 8936.510 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 10:45:52,352][Main][INFO] - [train] Step 23350 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.381 | Weights_l2 --> 8937.305 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 10:48:42,504][Main][INFO] - [train] Step 23400 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.381 | Weights_l2 --> 8938.113 | Lr --> 0.007 | Seconds_per_step --> 3.403 | [2024-08-10 10:51:32,165][Main][INFO] - [train] Step 23450 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.375 | Weights_l2 --> 8938.876 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 10:54:21,677][Main][INFO] - [train] Step 23500 out of 80000 | Loss --> 2.100 | Grad_l2 --> 0.380 | Weights_l2 --> 8939.651 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 10:57:10,744][Main][INFO] - [train] Step 23550 out of 80000 | Loss --> 2.101 | Grad_l2 --> 0.381 | Weights_l2 --> 8940.430 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 11:00:00,750][Main][INFO] - [train] Step 23600 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.376 | Weights_l2 --> 8941.201 | Lr --> 0.007 | Seconds_per_step --> 3.400 | [2024-08-10 11:02:49,980][Main][INFO] - [train] Step 23650 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.377 | Weights_l2 --> 8941.992 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 11:05:38,767][Main][INFO] - [train] Step 23700 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.380 | Weights_l2 --> 8942.754 | Lr --> 0.007 | Seconds_per_step --> 3.376 | [2024-08-10 11:08:28,650][Main][INFO] - [train] Step 23750 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.380 | Weights_l2 --> 8943.525 | Lr --> 0.007 | Seconds_per_step --> 3.398 | [2024-08-10 11:11:17,799][Main][INFO] - [train] Step 23800 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.378 | Weights_l2 --> 8944.294 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:14:06,944][Main][INFO] - [train] Step 23850 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.373 | Weights_l2 --> 8945.061 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:16:56,683][Main][INFO] - [train] Step 23900 out of 80000 | Loss --> 2.101 | Grad_l2 --> 0.376 | Weights_l2 --> 8945.835 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 11:19:45,844][Main][INFO] - [train] Step 23950 out of 80000 | Loss --> 2.092 | Grad_l2 --> 0.375 | Weights_l2 --> 8946.629 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:22:35,661][Main][INFO] - [train] Step 24000 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.377 | Weights_l2 --> 8947.382 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 11:25:23,611][Main][INFO] - [train] Step 24050 out of 80000 | Loss --> 2.094 | Grad_l2 --> 0.374 | Weights_l2 --> 8948.130 | Lr --> 0.007 | Seconds_per_step --> 3.359 | [2024-08-10 11:28:12,984][Main][INFO] - [train] Step 24100 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.373 | Weights_l2 --> 8948.867 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 11:31:01,571][Main][INFO] - [train] Step 24150 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.374 | Weights_l2 --> 8949.631 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 11:33:50,863][Main][INFO] - [train] Step 24200 out of 80000 | Loss --> 2.097 | Grad_l2 --> 0.376 | Weights_l2 --> 8950.388 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:36:40,686][Main][INFO] - [train] Step 24250 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.374 | Weights_l2 --> 8951.146 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 11:39:29,849][Main][INFO] - [train] Step 24300 out of 80000 | Loss --> 2.090 | Grad_l2 --> 0.373 | Weights_l2 --> 8951.859 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:42:19,157][Main][INFO] - [train] Step 24350 out of 80000 | Loss --> 2.097 | Grad_l2 --> 0.371 | Weights_l2 --> 8952.607 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:45:08,412][Main][INFO] - [train] Step 24400 out of 80000 | Loss --> 2.094 | Grad_l2 --> 0.372 | Weights_l2 --> 8953.362 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 11:47:57,713][Main][INFO] - [train] Step 24450 out of 80000 | Loss --> 2.091 | Grad_l2 --> 0.375 | Weights_l2 --> 8954.094 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:50:46,406][Main][INFO] - [train] Step 24500 out of 80000 | Loss --> 2.100 | Grad_l2 --> 0.369 | Weights_l2 --> 8954.854 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 11:53:35,339][Main][INFO] - [train] Step 24550 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.374 | Weights_l2 --> 8955.580 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 11:56:24,268][Main][INFO] - [train] Step 24600 out of 80000 | Loss --> 2.104 | Grad_l2 --> 0.375 | Weights_l2 --> 8956.344 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 11:59:13,863][Main][INFO] - [train] Step 24650 out of 80000 | Loss --> 2.103 | Grad_l2 --> 0.376 | Weights_l2 --> 8957.068 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:02:03,598][Main][INFO] - [train] Step 24700 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.370 | Weights_l2 --> 8957.814 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 12:04:52,269][Main][INFO] - [train] Step 24750 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.365 | Weights_l2 --> 8958.570 | Lr --> 0.007 | Seconds_per_step --> 3.373 | [2024-08-10 12:07:41,278][Main][INFO] - [train] Step 24800 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.373 | Weights_l2 --> 8959.279 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 12:10:31,555][Main][INFO] - [train] Step 24850 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.369 | Weights_l2 --> 8960.027 | Lr --> 0.007 | Seconds_per_step --> 3.406 | [2024-08-10 12:13:21,204][Main][INFO] - [train] Step 24900 out of 80000 | Loss --> 2.102 | Grad_l2 --> 0.372 | Weights_l2 --> 8960.746 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 12:16:10,885][Main][INFO] - [train] Step 24950 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.370 | Weights_l2 --> 8961.486 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 12:19:00,451][Main][INFO] - [train] Step 25000 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.372 | Weights_l2 --> 8962.205 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 12:19:00,451][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-25000 [2024-08-10 12:19:00,454][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 12:19:02,584][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-25000/model.safetensors [2024-08-10 12:19:05,471][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-25000/optimizer.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-25000/scheduler.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-25000/sampler.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-25000/sampler_1.bin [2024-08-10 12:19:05,473][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-25000/random_states_0.pkl [2024-08-10 12:21:55,414][Main][INFO] - [train] Step 25050 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.368 | Weights_l2 --> 8962.926 | Lr --> 0.007 | Seconds_per_step --> 3.499 | [2024-08-10 12:24:44,641][Main][INFO] - [train] Step 25100 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.368 | Weights_l2 --> 8963.658 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 12:27:33,678][Main][INFO] - [train] Step 25150 out of 80000 | Loss --> 2.104 | Grad_l2 --> 0.370 | Weights_l2 --> 8964.369 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 12:30:22,703][Main][INFO] - [train] Step 25200 out of 80000 | Loss --> 2.102 | Grad_l2 --> 0.367 | Weights_l2 --> 8965.077 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 12:33:12,286][Main][INFO] - [train] Step 25250 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.367 | Weights_l2 --> 8965.794 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:36:00,779][Main][INFO] - [train] Step 25300 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.367 | Weights_l2 --> 8966.528 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 12:38:48,971][Main][INFO] - [train] Step 25350 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.364 | Weights_l2 --> 8967.235 | Lr --> 0.007 | Seconds_per_step --> 3.364 | [2024-08-10 12:41:37,429][Main][INFO] - [train] Step 25400 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.363 | Weights_l2 --> 8967.925 | Lr --> 0.007 | Seconds_per_step --> 3.369 | [2024-08-10 12:44:26,521][Main][INFO] - [train] Step 25450 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.371 | Weights_l2 --> 8968.626 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 12:47:15,850][Main][INFO] - [train] Step 25500 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.368 | Weights_l2 --> 8969.323 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 12:50:05,229][Main][INFO] - [train] Step 25550 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.362 | Weights_l2 --> 8970.029 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 12:52:54,821][Main][INFO] - [train] Step 25600 out of 80000 | Loss --> 2.112 | Grad_l2 --> 0.365 | Weights_l2 --> 8970.711 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:55:44,920][Main][INFO] - [train] Step 25650 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.366 | Weights_l2 --> 8971.399 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 12:58:32,938][Main][INFO] - [train] Step 25700 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.364 | Weights_l2 --> 8972.067 | Lr --> 0.007 | Seconds_per_step --> 3.360 | [2024-08-10 13:01:22,907][Main][INFO] - [train] Step 25750 out of 80000 | Loss --> 2.124 | Grad_l2 --> 0.365 | Weights_l2 --> 8972.769 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 13:04:12,153][Main][INFO] - [train] Step 25800 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.365 | Weights_l2 --> 8973.450 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:07:02,172][Main][INFO] - [train] Step 25850 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.367 | Weights_l2 --> 8974.127 | Lr --> 0.007 | Seconds_per_step --> 3.400 | [2024-08-10 13:09:51,422][Main][INFO] - [train] Step 25900 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.365 | Weights_l2 --> 8974.808 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:12:40,893][Main][INFO] - [train] Step 25950 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.367 | Weights_l2 --> 8975.499 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 13:15:30,193][Main][INFO] - [train] Step 26000 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.365 | Weights_l2 --> 8976.191 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 13:18:19,215][Main][INFO] - [train] Step 26050 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.366 | Weights_l2 --> 8976.887 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 13:21:09,000][Main][INFO] - [train] Step 26100 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.367 | Weights_l2 --> 8977.570 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 13:23:57,054][Main][INFO] - [train] Step 26150 out of 80000 | Loss --> 2.112 | Grad_l2 --> 0.365 | Weights_l2 --> 8978.248 | Lr --> 0.007 | Seconds_per_step --> 3.361 | [2024-08-10 13:26:46,324][Main][INFO] - [train] Step 26200 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.362 | Weights_l2 --> 8978.920 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:29:35,823][Main][INFO] - [train] Step 26250 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.360 | Weights_l2 --> 8979.595 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 13:32:25,900][Main][INFO] - [train] Step 26300 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.360 | Weights_l2 --> 8980.264 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 13:35:14,993][Main][INFO] - [train] Step 26350 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.364 | Weights_l2 --> 8980.936 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 13:38:03,528][Main][INFO] - [train] Step 26400 out of 80000 | Loss --> 2.123 | Grad_l2 --> 0.364 | Weights_l2 --> 8981.629 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 13:40:55,925][Main][INFO] - [train] Step 26450 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.363 | Weights_l2 --> 8982.292 | Lr --> 0.007 | Seconds_per_step --> 3.448 | [2024-08-10 13:43:46,071][Main][INFO] - [train] Step 26500 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.360 | Weights_l2 --> 8982.970 | Lr --> 0.007 | Seconds_per_step --> 3.403 | [2024-08-10 13:46:36,135][Main][INFO] - [train] Step 26550 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.362 | Weights_l2 --> 8983.635 | Lr --> 0.007 | Seconds_per_step --> 3.401 | [2024-08-10 13:49:25,847][Main][INFO] - [train] Step 26600 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.359 | Weights_l2 --> 8984.271 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 13:52:14,646][Main][INFO] - [train] Step 26650 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.359 | Weights_l2 --> 8984.935 | Lr --> 0.007 | Seconds_per_step --> 3.376 | [2024-08-10 13:55:04,238][Main][INFO] - [train] Step 26700 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.361 | Weights_l2 --> 8985.600 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 13:57:52,429][Main][INFO] - [train] Step 26750 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.358 | Weights_l2 --> 8986.254 | Lr --> 0.007 | Seconds_per_step --> 3.364 | [2024-08-10 14:00:40,663][Main][INFO] - [train] Step 26800 out of 80000 | Loss --> 2.120 | Grad_l2 --> 0.358 | Weights_l2 --> 8986.901 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 14:03:30,358][Main][INFO] - [train] Step 26850 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.356 | Weights_l2 --> 8987.561 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 14:06:20,974][Main][INFO] - [train] Step 26900 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.358 | Weights_l2 --> 8988.213 | Lr --> 0.007 | Seconds_per_step --> 3.412 | [2024-08-10 14:09:10,794][Main][INFO] - [train] Step 26950 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.355 | Weights_l2 --> 8988.873 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 14:11:59,378][Main][INFO] - [train] Step 27000 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.356 | Weights_l2 --> 8989.515 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 14:14:49,113][Main][INFO] - [train] Step 27050 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.358 | Weights_l2 --> 8990.140 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 14:17:39,074][Main][INFO] - [train] Step 27100 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.356 | Weights_l2 --> 8990.787 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 14:20:28,802][Main][INFO] - [train] Step 27150 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.359 | Weights_l2 --> 8991.446 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 14:23:18,386][Main][INFO] - [train] Step 27200 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.355 | Weights_l2 --> 8992.082 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 14:26:07,414][Main][INFO] - [train] Step 27250 out of 80000 | Loss --> 2.102 | Grad_l2 --> 0.356 | Weights_l2 --> 8992.722 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 14:28:57,816][Main][INFO] - [train] Step 27300 out of 80000 | Loss --> 2.099 | Grad_l2 --> 0.355 | Weights_l2 --> 8993.348 | Lr --> 0.007 | Seconds_per_step --> 3.408 | [2024-08-10 14:31:46,143][Main][INFO] - [train] Step 27350 out of 80000 | Loss --> 2.099 | Grad_l2 --> 0.356 | Weights_l2 --> 8993.988 | Lr --> 0.007 | Seconds_per_step --> 3.367 | [2024-08-10 14:34:34,674][Main][INFO] - [train] Step 27400 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.352 | Weights_l2 --> 8994.615 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 14:37:23,575][Main][INFO] - [train] Step 27450 out of 80000 | Loss --> 2.092 | Grad_l2 --> 0.350 | Weights_l2 --> 8995.217 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 14:40:13,983][Main][INFO] - [train] Step 27500 out of 80000 | Loss --> 2.097 | Grad_l2 --> 0.356 | Weights_l2 --> 8995.840 | Lr --> 0.007 | Seconds_per_step --> 3.408 | [2024-08-10 14:43:03,622][Main][INFO] - [train] Step 27550 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.358 | Weights_l2 --> 8996.451 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 14:45:52,968][Main][INFO] - [train] Step 27600 out of 80000 | Loss --> 2.099 | Grad_l2 --> 0.351 | Weights_l2 --> 8997.061 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 14:48:41,052][Main][INFO] - [train] Step 27650 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.355 | Weights_l2 --> 8997.689 | Lr --> 0.007 | Seconds_per_step --> 3.362 | [2024-08-10 14:51:30,840][Main][INFO] - [train] Step 27700 out of 80000 | Loss --> 2.093 | Grad_l2 --> 0.353 | Weights_l2 --> 8998.303 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 14:54:19,140][Main][INFO] - [train] Step 27750 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.351 | Weights_l2 --> 8998.943 | Lr --> 0.007 | Seconds_per_step --> 3.366 | [2024-08-10 14:57:08,364][Main][INFO] - [train] Step 27800 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.350 | Weights_l2 --> 8999.548 | Lr --> 0.007 | Seconds_per_step --> 3.384 | [2024-08-10 14:59:56,874][Main][INFO] - [train] Step 27850 out of 80000 | Loss --> 2.103 | Grad_l2 --> 0.352 | Weights_l2 --> 9000.166 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 15:02:45,245][Main][INFO] - [train] Step 27900 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.351 | Weights_l2 --> 9000.752 | Lr --> 0.007 | Seconds_per_step --> 3.367 | [2024-08-10 15:05:34,878][Main][INFO] - [train] Step 27950 out of 80000 | Loss --> 2.090 | Grad_l2 --> 0.351 | Weights_l2 --> 9001.367 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 15:08:27,943][Main][INFO] - [train] Step 28000 out of 80000 | Loss --> 2.092 | Grad_l2 --> 0.353 | Weights_l2 --> 9001.952 | Lr --> 0.007 | Seconds_per_step --> 3.461 | [2024-08-10 15:11:16,686][Main][INFO] - [train] Step 28050 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.347 | Weights_l2 --> 9002.530 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 15:14:05,413][Main][INFO] - [train] Step 28100 out of 80000 | Loss --> 2.091 | Grad_l2 --> 0.351 | Weights_l2 --> 9003.118 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 15:16:54,575][Main][INFO] - [train] Step 28150 out of 80000 | Loss --> 2.083 | Grad_l2 --> 0.352 | Weights_l2 --> 9003.707 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 15:19:42,657][Main][INFO] - [train] Step 28200 out of 80000 | Loss --> 2.089 | Grad_l2 --> 0.348 | Weights_l2 --> 9004.297 | Lr --> 0.007 | Seconds_per_step --> 3.362 | [2024-08-10 15:22:30,942][Main][INFO] - [train] Step 28250 out of 80000 | Loss --> 2.078 | Grad_l2 --> 0.352 | Weights_l2 --> 9004.874 | Lr --> 0.007 | Seconds_per_step --> 3.366 | [2024-08-10 15:25:19,094][Main][INFO] - [train] Step 28300 out of 80000 | Loss --> 2.085 | Grad_l2 --> 0.349 | Weights_l2 --> 9005.468 | Lr --> 0.007 | Seconds_per_step --> 3.363 | [2024-08-10 15:28:08,763][Main][INFO] - [train] Step 28350 out of 80000 | Loss --> 2.084 | Grad_l2 --> 0.348 | Weights_l2 --> 9006.043 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 15:30:57,721][Main][INFO] - [train] Step 28400 out of 80000 | Loss --> 2.084 | Grad_l2 --> 0.347 | Weights_l2 --> 9006.633 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 15:33:46,251][Main][INFO] - [train] Step 28450 out of 80000 | Loss --> 2.076 | Grad_l2 --> 0.351 | Weights_l2 --> 9007.186 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 15:36:34,855][Main][INFO] - [train] Step 28500 out of 80000 | Loss --> 2.074 | Grad_l2 --> 0.357 | Weights_l2 --> 9007.759 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 15:39:24,573][Main][INFO] - [train] Step 28550 out of 80000 | Loss --> 2.073 | Grad_l2 --> 0.350 | Weights_l2 --> 9008.346 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 15:42:13,485][Main][INFO] - [train] Step 28600 out of 80000 | Loss --> 2.068 | Grad_l2 --> 0.349 | Weights_l2 --> 9008.929 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 15:45:01,743][Main][INFO] - [train] Step 28650 out of 80000 | Loss --> 2.068 | Grad_l2 --> 0.348 | Weights_l2 --> 9009.504 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 15:47:50,368][Main][INFO] - [train] Step 28700 out of 80000 | Loss --> 2.072 | Grad_l2 --> 0.348 | Weights_l2 --> 9010.089 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 15:50:39,698][Main][INFO] - [train] Step 28750 out of 80000 | Loss --> 2.071 | Grad_l2 --> 0.345 | Weights_l2 --> 9010.647 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 15:53:28,221][Main][INFO] - [train] Step 28800 out of 80000 | Loss --> 2.069 | Grad_l2 --> 0.349 | Weights_l2 --> 9011.202 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 15:56:16,912][Main][INFO] - [train] Step 28850 out of 80000 | Loss --> 2.058 | Grad_l2 --> 0.344 | Weights_l2 --> 9011.796 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 15:59:05,519][Main][INFO] - [train] Step 28900 out of 80000 | Loss --> 2.063 | Grad_l2 --> 0.349 | Weights_l2 --> 9012.353 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 16:01:55,321][Main][INFO] - [train] Step 28950 out of 80000 | Loss --> 2.067 | Grad_l2 --> 0.347 | Weights_l2 --> 9012.927 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 16:04:43,967][Main][INFO] - [train] Step 29000 out of 80000 | Loss --> 2.064 | Grad_l2 --> 0.342 | Weights_l2 --> 9013.466 | Lr --> 0.007 | Seconds_per_step --> 3.373 | [2024-08-10 16:07:32,200][Main][INFO] - [train] Step 29050 out of 80000 | Loss --> 2.065 | Grad_l2 --> 0.349 | Weights_l2 --> 9014.025 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 16:10:20,259][Main][INFO] - [train] Step 29100 out of 80000 | Loss --> 2.068 | Grad_l2 --> 0.345 | Weights_l2 --> 9014.577 | Lr --> 0.007 | Seconds_per_step --> 3.361 | [2024-08-10 16:13:08,836][Main][INFO] - [train] Step 29150 out of 80000 | Loss --> 2.057 | Grad_l2 --> 0.343 | Weights_l2 --> 9015.114 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 16:15:57,855][Main][INFO] - [train] Step 29200 out of 80000 | Loss --> 2.056 | Grad_l2 --> 0.344 | Weights_l2 --> 9015.652 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 16:18:46,380][Main][INFO] - [train] Step 29250 out of 80000 | Loss --> 2.052 | Grad_l2 --> 0.344 | Weights_l2 --> 9016.179 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 16:21:35,062][Main][INFO] - [train] Step 29300 out of 80000 | Loss --> 2.049 | Grad_l2 --> 0.344 | Weights_l2 --> 9016.760 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 16:24:23,369][Main][INFO] - [train] Step 29350 out of 80000 | Loss --> 2.059 | Grad_l2 --> 0.346 | Weights_l2 --> 9017.287 | Lr --> 0.007 | Seconds_per_step --> 3.366 | [2024-08-10 16:27:12,911][Main][INFO] - [train] Step 29400 out of 80000 | Loss --> 2.062 | Grad_l2 --> 0.345 | Weights_l2 --> 9017.828 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 16:30:01,803][Main][INFO] - [train] Step 29450 out of 80000 | Loss --> 2.050 | Grad_l2 --> 0.341 | Weights_l2 --> 9018.397 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 16:32:50,350][Main][INFO] - [train] Step 29500 out of 80000 | Loss --> 2.042 | Grad_l2 --> 0.348 | Weights_l2 --> 9018.920 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 16:35:38,777][Main][INFO] - [train] Step 29550 out of 80000 | Loss --> 2.054 | Grad_l2 --> 0.340 | Weights_l2 --> 9019.463 | Lr --> 0.007 | Seconds_per_step --> 3.369 | [2024-08-10 16:38:27,972][Main][INFO] - [train] Step 29600 out of 80000 | Loss --> 2.051 | Grad_l2 --> 0.342 | Weights_l2 --> 9020.006 | Lr --> 0.007 | Seconds_per_step --> 3.384 | [2024-08-10 16:41:16,224][Main][INFO] - [train] Step 29650 out of 80000 | Loss --> 2.047 | Grad_l2 --> 0.340 | Weights_l2 --> 9020.522 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 16:44:04,496][Main][INFO] - [train] Step 29700 out of 80000 | Loss --> 2.045 | Grad_l2 --> 0.341 | Weights_l2 --> 9021.050 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 16:46:53,269][Main][INFO] - [train] Step 29750 out of 80000 | Loss --> 2.049 | Grad_l2 --> 0.341 | Weights_l2 --> 9021.579 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 16:49:48,011][Main][INFO] - [train] Step 29800 out of 80000 | Loss --> 2.040 | Grad_l2 --> 0.340 | Weights_l2 --> 9022.114 | Lr --> 0.007 | Seconds_per_step --> 3.495 | [2024-08-10 16:52:36,978][Main][INFO] - [train] Step 29850 out of 80000 | Loss --> 2.048 | Grad_l2 --> 0.339 | Weights_l2 --> 9022.615 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 16:55:25,738][Main][INFO] - [train] Step 29900 out of 80000 | Loss --> 2.038 | Grad_l2 --> 0.339 | Weights_l2 --> 9023.139 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 16:58:14,275][Main][INFO] - [train] Step 29950 out of 80000 | Loss --> 2.042 | Grad_l2 --> 0.340 | Weights_l2 --> 9023.652 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 17:01:03,660][Main][INFO] - [train] Step 30000 out of 80000 | Loss --> 2.038 | Grad_l2 --> 0.341 | Weights_l2 --> 9024.174 | Lr --> 0.006 | Seconds_per_step --> 3.388 | [2024-08-10 17:01:03,660][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-30000 [2024-08-10 17:01:03,664][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 17:01:05,567][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-30000/model.safetensors [2024-08-10 17:01:08,375][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-30000/optimizer.bin [2024-08-10 17:01:08,375][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-30000/scheduler.bin [2024-08-10 17:01:08,375][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-30000/sampler.bin [2024-08-10 17:01:08,375][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-30000/sampler_1.bin [2024-08-10 17:01:08,376][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-30000/random_states_0.pkl [2024-08-10 17:03:56,367][Main][INFO] - [train] Step 30050 out of 80000 | Loss --> 2.033 | Grad_l2 --> 0.340 | Weights_l2 --> 9024.673 | Lr --> 0.006 | Seconds_per_step --> 3.454 | [2024-08-10 17:06:45,781][Main][INFO] - [train] Step 30100 out of 80000 | Loss --> 2.026 | Grad_l2 --> 0.336 | Weights_l2 --> 9025.193 | Lr --> 0.006 | Seconds_per_step --> 3.388 | [2024-08-10 17:09:35,059][Main][INFO] - [train] Step 30150 out of 80000 | Loss --> 2.029 | Grad_l2 --> 0.339 | Weights_l2 --> 9025.688 | Lr --> 0.006 | Seconds_per_step --> 3.386 | [2024-08-10 17:12:24,449][Main][INFO] - [train] Step 30200 out of 80000 | Loss --> 2.029 | Grad_l2 --> 0.337 | Weights_l2 --> 9026.206 | Lr --> 0.006 | Seconds_per_step --> 3.388 | [2024-08-10 17:15:13,224][Main][INFO] - [train] Step 30250 out of 80000 | Loss --> 2.030 | Grad_l2 --> 0.338 | Weights_l2 --> 9026.710 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 17:18:01,694][Main][INFO] - [train] Step 30300 out of 80000 | Loss --> 2.025 | Grad_l2 --> 0.339 | Weights_l2 --> 9027.226 | Lr --> 0.006 | Seconds_per_step --> 3.369 | [2024-08-10 17:20:50,461][Main][INFO] - [train] Step 30350 out of 80000 | Loss --> 2.023 | Grad_l2 --> 0.338 | Weights_l2 --> 9027.736 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 17:23:39,856][Main][INFO] - [train] Step 30400 out of 80000 | Loss --> 2.025 | Grad_l2 --> 0.336 | Weights_l2 --> 9028.239 | Lr --> 0.006 | Seconds_per_step --> 3.388 | [2024-08-10 17:26:28,410][Main][INFO] - [train] Step 30450 out of 80000 | Loss --> 2.028 | Grad_l2 --> 0.337 | Weights_l2 --> 9028.741 | Lr --> 0.006 | Seconds_per_step --> 3.371 | [2024-08-10 17:29:17,353][Main][INFO] - [train] Step 30500 out of 80000 | Loss --> 2.020 | Grad_l2 --> 0.337 | Weights_l2 --> 9029.242 | Lr --> 0.006 | Seconds_per_step --> 3.379 | [2024-08-10 17:32:06,838][Main][INFO] - [train] Step 30550 out of 80000 | Loss --> 2.022 | Grad_l2 --> 0.336 | Weights_l2 --> 9029.748 | Lr --> 0.006 | Seconds_per_step --> 3.390 | [2024-08-10 17:34:56,133][Main][INFO] - [train] Step 30600 out of 80000 | Loss --> 2.020 | Grad_l2 --> 0.338 | Weights_l2 --> 9030.233 | Lr --> 0.006 | Seconds_per_step --> 3.386 | [2024-08-10 17:37:44,605][Main][INFO] - [train] Step 30650 out of 80000 | Loss --> 2.020 | Grad_l2 --> 0.335 | Weights_l2 --> 9030.723 | Lr --> 0.006 | Seconds_per_step --> 3.369 | [2024-08-10 17:40:36,273][Main][INFO] - [train] Step 30700 out of 80000 | Loss --> 2.022 | Grad_l2 --> 0.336 | Weights_l2 --> 9031.205 | Lr --> 0.006 | Seconds_per_step --> 3.433 | [2024-08-10 17:43:24,760][Main][INFO] - [train] Step 30750 out of 80000 | Loss --> 2.009 | Grad_l2 --> 0.338 | Weights_l2 --> 9031.702 | Lr --> 0.006 | Seconds_per_step --> 3.370 | [2024-08-10 17:46:13,103][Main][INFO] - [train] Step 30800 out of 80000 | Loss --> 2.010 | Grad_l2 --> 0.336 | Weights_l2 --> 9032.180 | Lr --> 0.006 | Seconds_per_step --> 3.367 | [2024-08-10 17:49:02,171][Main][INFO] - [train] Step 30850 out of 80000 | Loss --> 2.016 | Grad_l2 --> 0.337 | Weights_l2 --> 9032.684 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 17:51:50,797][Main][INFO] - [train] Step 30900 out of 80000 | Loss --> 2.008 | Grad_l2 --> 0.333 | Weights_l2 --> 9033.166 | Lr --> 0.006 | Seconds_per_step --> 3.373 | [2024-08-10 17:54:39,186][Main][INFO] - [train] Step 30950 out of 80000 | Loss --> 2.004 | Grad_l2 --> 0.335 | Weights_l2 --> 9033.641 | Lr --> 0.006 | Seconds_per_step --> 3.368 | [2024-08-10 17:57:28,172][Main][INFO] - [train] Step 31000 out of 80000 | Loss --> 2.003 | Grad_l2 --> 0.333 | Weights_l2 --> 9034.106 | Lr --> 0.006 | Seconds_per_step --> 3.380 | [2024-08-10 18:00:17,752][Main][INFO] - [train] Step 31050 out of 80000 | Loss --> 2.001 | Grad_l2 --> 0.334 | Weights_l2 --> 9034.578 | Lr --> 0.006 | Seconds_per_step --> 3.392 | [2024-08-10 18:03:05,833][Main][INFO] - [train] Step 31100 out of 80000 | Loss --> 1.995 | Grad_l2 --> 0.333 | Weights_l2 --> 9035.048 | Lr --> 0.006 | Seconds_per_step --> 3.362 | [2024-08-10 18:05:57,136][Main][INFO] - [train] Step 31150 out of 80000 | Loss --> 2.004 | Grad_l2 --> 0.330 | Weights_l2 --> 9035.529 | Lr --> 0.006 | Seconds_per_step --> 3.426 | [2024-08-10 18:08:59,939][Main][INFO] - [train] Step 31200 out of 80000 | Loss --> 1.997 | Grad_l2 --> 0.336 | Weights_l2 --> 9036.006 | Lr --> 0.006 | Seconds_per_step --> 3.656 | [2024-08-10 18:11:52,395][Main][INFO] - [train] Step 31250 out of 80000 | Loss --> 1.994 | Grad_l2 --> 0.333 | Weights_l2 --> 9036.471 | Lr --> 0.006 | Seconds_per_step --> 3.449 | [2024-08-10 18:14:40,969][Main][INFO] - [train] Step 31300 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.333 | Weights_l2 --> 9036.920 | Lr --> 0.006 | Seconds_per_step --> 3.371 | [2024-08-10 18:17:29,783][Main][INFO] - [train] Step 31350 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.330 | Weights_l2 --> 9037.357 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 18:20:21,690][Main][INFO] - [train] Step 31400 out of 80000 | Loss --> 2.000 | Grad_l2 --> 0.329 | Weights_l2 --> 9037.811 | Lr --> 0.006 | Seconds_per_step --> 3.438 | [2024-08-10 18:23:11,088][Main][INFO] - [train] Step 31450 out of 80000 | Loss --> 1.986 | Grad_l2 --> 0.333 | Weights_l2 --> 9038.266 | Lr --> 0.006 | Seconds_per_step --> 3.388 | [2024-08-10 18:26:00,153][Main][INFO] - [train] Step 31500 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.330 | Weights_l2 --> 9038.727 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 18:28:53,003][Main][INFO] - [train] Step 31550 out of 80000 | Loss --> 1.992 | Grad_l2 --> 0.328 | Weights_l2 --> 9039.170 | Lr --> 0.006 | Seconds_per_step --> 3.457 | [2024-08-10 18:32:18,339][Main][INFO] - [train] Step 31600 out of 80000 | Loss --> 1.980 | Grad_l2 --> 0.334 | Weights_l2 --> 9039.624 | Lr --> 0.006 | Seconds_per_step --> 4.107 | [2024-08-10 18:35:13,944][Main][INFO] - [train] Step 31650 out of 80000 | Loss --> 1.979 | Grad_l2 --> 0.331 | Weights_l2 --> 9040.077 | Lr --> 0.006 | Seconds_per_step --> 3.512 | [2024-08-10 18:38:16,891][Main][INFO] - [train] Step 31700 out of 80000 | Loss --> 1.982 | Grad_l2 --> 0.333 | Weights_l2 --> 9040.502 | Lr --> 0.006 | Seconds_per_step --> 3.659 | [2024-08-10 18:41:07,083][Main][INFO] - [train] Step 31750 out of 80000 | Loss --> 1.974 | Grad_l2 --> 0.330 | Weights_l2 --> 9040.956 | Lr --> 0.006 | Seconds_per_step --> 3.404 | [2024-08-10 18:44:09,382][Main][INFO] - [train] Step 31800 out of 80000 | Loss --> 1.975 | Grad_l2 --> 0.331 | Weights_l2 --> 9041.401 | Lr --> 0.006 | Seconds_per_step --> 3.646 | [2024-08-10 18:47:08,204][Main][INFO] - [train] Step 31850 out of 80000 | Loss --> 1.977 | Grad_l2 --> 0.327 | Weights_l2 --> 9041.844 | Lr --> 0.006 | Seconds_per_step --> 3.576 | [2024-08-10 18:49:56,651][Main][INFO] - [train] Step 31900 out of 80000 | Loss --> 1.979 | Grad_l2 --> 0.327 | Weights_l2 --> 9042.293 | Lr --> 0.006 | Seconds_per_step --> 3.369 | [2024-08-10 18:52:45,087][Main][INFO] - [train] Step 31950 out of 80000 | Loss --> 1.983 | Grad_l2 --> 0.328 | Weights_l2 --> 9042.741 | Lr --> 0.006 | Seconds_per_step --> 3.369 | [2024-08-10 18:55:33,129][Main][INFO] - [train] Step 32000 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.328 | Weights_l2 --> 9043.156 | Lr --> 0.006 | Seconds_per_step --> 3.361 | [2024-08-10 18:58:21,966][Main][INFO] - [train] Step 32050 out of 80000 | Loss --> 1.974 | Grad_l2 --> 0.332 | Weights_l2 --> 9043.615 | Lr --> 0.006 | Seconds_per_step --> 3.377 | [2024-08-10 19:01:10,644][Main][INFO] - [train] Step 32100 out of 80000 | Loss --> 1.971 | Grad_l2 --> 0.336 | Weights_l2 --> 9044.048 | Lr --> 0.006 | Seconds_per_step --> 3.374 | [2024-08-10 19:03:59,925][Main][INFO] - [train] Step 32150 out of 80000 | Loss --> 1.975 | Grad_l2 --> 0.329 | Weights_l2 --> 9044.462 | Lr --> 0.006 | Seconds_per_step --> 3.386 | [2024-08-10 19:06:48,343][Main][INFO] - [train] Step 32200 out of 80000 | Loss --> 1.979 | Grad_l2 --> 0.329 | Weights_l2 --> 9044.883 | Lr --> 0.006 | Seconds_per_step --> 3.368 | [2024-08-10 19:09:37,807][Main][INFO] - [train] Step 32250 out of 80000 | Loss --> 1.966 | Grad_l2 --> 0.329 | Weights_l2 --> 9045.314 | Lr --> 0.006 | Seconds_per_step --> 3.389 | [2024-08-10 19:12:25,582][Main][INFO] - [train] Step 32300 out of 80000 | Loss --> 1.975 | Grad_l2 --> 0.326 | Weights_l2 --> 9045.752 | Lr --> 0.006 | Seconds_per_step --> 3.355 | [2024-08-10 19:15:14,152][Main][INFO] - [train] Step 32350 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.332 | Weights_l2 --> 9046.174 | Lr --> 0.006 | Seconds_per_step --> 3.371 | [2024-08-10 19:18:02,528][Main][INFO] - [train] Step 32400 out of 80000 | Loss --> 1.971 | Grad_l2 --> 0.326 | Weights_l2 --> 9046.586 | Lr --> 0.006 | Seconds_per_step --> 3.368 | [2024-08-10 19:20:51,357][Main][INFO] - [train] Step 32450 out of 80000 | Loss --> 1.966 | Grad_l2 --> 0.328 | Weights_l2 --> 9046.976 | Lr --> 0.006 | Seconds_per_step --> 3.377 | [2024-08-10 19:23:41,166][Main][INFO] - [train] Step 32500 out of 80000 | Loss --> 1.976 | Grad_l2 --> 0.332 | Weights_l2 --> 9047.423 | Lr --> 0.006 | Seconds_per_step --> 3.396 | [2024-08-10 19:26:30,513][Main][INFO] - [train] Step 32550 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.325 | Weights_l2 --> 9047.840 | Lr --> 0.006 | Seconds_per_step --> 3.387 | [2024-08-10 19:29:18,382][Main][INFO] - [train] Step 32600 out of 80000 | Loss --> 1.957 | Grad_l2 --> 0.327 | Weights_l2 --> 9048.263 | Lr --> 0.006 | Seconds_per_step --> 3.357 | [2024-08-10 19:32:06,781][Main][INFO] - [train] Step 32650 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.332 | Weights_l2 --> 9048.703 | Lr --> 0.006 | Seconds_per_step --> 3.368 | [2024-08-10 19:34:55,944][Main][INFO] - [train] Step 32700 out of 80000 | Loss --> 1.968 | Grad_l2 --> 0.326 | Weights_l2 --> 9049.119 | Lr --> 0.006 | Seconds_per_step --> 3.383 | [2024-08-10 19:37:44,560][Main][INFO] - [train] Step 32750 out of 80000 | Loss --> 1.971 | Grad_l2 --> 0.331 | Weights_l2 --> 9049.551 | Lr --> 0.006 | Seconds_per_step --> 3.372 | [2024-08-10 19:40:33,293][Main][INFO] - [train] Step 32800 out of 80000 | Loss --> 1.967 | Grad_l2 --> 0.327 | Weights_l2 --> 9049.957 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 19:43:22,355][Main][INFO] - [train] Step 32850 out of 80000 | Loss --> 1.964 | Grad_l2 --> 0.328 | Weights_l2 --> 9050.372 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 19:46:12,626][Main][INFO] - [train] Step 32900 out of 80000 | Loss --> 1.964 | Grad_l2 --> 0.326 | Weights_l2 --> 9050.775 | Lr --> 0.006 | Seconds_per_step --> 3.405 | [2024-08-10 19:49:01,476][Main][INFO] - [train] Step 32950 out of 80000 | Loss --> 1.965 | Grad_l2 --> 0.327 | Weights_l2 --> 9051.194 | Lr --> 0.006 | Seconds_per_step --> 3.377 | [2024-08-10 19:51:50,548][Main][INFO] - [train] Step 33000 out of 80000 | Loss --> 1.960 | Grad_l2 --> 0.325 | Weights_l2 --> 9051.604 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 19:54:38,786][Main][INFO] - [train] Step 33050 out of 80000 | Loss --> 1.958 | Grad_l2 --> 0.326 | Weights_l2 --> 9052.006 | Lr --> 0.006 | Seconds_per_step --> 3.365 | [2024-08-10 19:57:28,433][Main][INFO] - [train] Step 33100 out of 80000 | Loss --> 1.957 | Grad_l2 --> 0.326 | Weights_l2 --> 9052.404 | Lr --> 0.006 | Seconds_per_step --> 3.393 | [2024-08-10 20:00:16,265][Main][INFO] - [train] Step 33150 out of 80000 | Loss --> 1.955 | Grad_l2 --> 0.324 | Weights_l2 --> 9052.805 | Lr --> 0.006 | Seconds_per_step --> 3.357 | [2024-08-10 20:03:05,196][Main][INFO] - [train] Step 33200 out of 80000 | Loss --> 1.955 | Grad_l2 --> 0.328 | Weights_l2 --> 9053.209 | Lr --> 0.006 | Seconds_per_step --> 3.379 | [2024-08-10 20:05:53,340][Main][INFO] - [train] Step 33250 out of 80000 | Loss --> 1.949 | Grad_l2 --> 0.328 | Weights_l2 --> 9053.604 | Lr --> 0.006 | Seconds_per_step --> 3.363 | [2024-08-10 20:08:41,826][Main][INFO] - [train] Step 33300 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.325 | Weights_l2 --> 9054.000 | Lr --> 0.006 | Seconds_per_step --> 3.370 | [2024-08-10 20:11:30,678][Main][INFO] - [train] Step 33350 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.326 | Weights_l2 --> 9054.393 | Lr --> 0.006 | Seconds_per_step --> 3.377 | [2024-08-10 20:14:19,885][Main][INFO] - [train] Step 33400 out of 80000 | Loss --> 1.943 | Grad_l2 --> 0.326 | Weights_l2 --> 9054.797 | Lr --> 0.006 | Seconds_per_step --> 3.384 | [2024-08-10 20:17:08,133][Main][INFO] - [train] Step 33450 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.323 | Weights_l2 --> 9055.182 | Lr --> 0.006 | Seconds_per_step --> 3.365 | [2024-08-10 20:19:57,277][Main][INFO] - [train] Step 33500 out of 80000 | Loss --> 1.954 | Grad_l2 --> 0.323 | Weights_l2 --> 9055.580 | Lr --> 0.006 | Seconds_per_step --> 3.383 | [2024-08-10 20:22:45,534][Main][INFO] - [train] Step 33550 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.325 | Weights_l2 --> 9055.965 | Lr --> 0.006 | Seconds_per_step --> 3.365 | [2024-08-10 20:25:34,304][Main][INFO] - [train] Step 33600 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.322 | Weights_l2 --> 9056.338 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 20:28:22,563][Main][INFO] - [train] Step 33650 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.326 | Weights_l2 --> 9056.705 | Lr --> 0.006 | Seconds_per_step --> 3.365 | [2024-08-10 20:31:11,603][Main][INFO] - [train] Step 33700 out of 80000 | Loss --> 1.956 | Grad_l2 --> 0.323 | Weights_l2 --> 9057.105 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 20:34:00,728][Main][INFO] - [train] Step 33750 out of 80000 | Loss --> 1.942 | Grad_l2 --> 0.324 | Weights_l2 --> 9057.475 | Lr --> 0.006 | Seconds_per_step --> 3.383 | [2024-08-10 20:36:50,006][Main][INFO] - [train] Step 33800 out of 80000 | Loss --> 1.954 | Grad_l2 --> 0.327 | Weights_l2 --> 9057.854 | Lr --> 0.006 | Seconds_per_step --> 3.386 | [2024-08-10 20:39:38,800][Main][INFO] - [train] Step 33850 out of 80000 | Loss --> 1.958 | Grad_l2 --> 0.324 | Weights_l2 --> 9058.234 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 20:42:27,584][Main][INFO] - [train] Step 33900 out of 80000 | Loss --> 1.946 | Grad_l2 --> 0.326 | Weights_l2 --> 9058.594 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 20:45:16,609][Main][INFO] - [train] Step 33950 out of 80000 | Loss --> 1.950 | Grad_l2 --> 0.328 | Weights_l2 --> 9058.969 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 20:48:07,179][Main][INFO] - [train] Step 34000 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.324 | Weights_l2 --> 9059.358 | Lr --> 0.006 | Seconds_per_step --> 3.411 | [2024-08-10 20:50:56,670][Main][INFO] - [train] Step 34050 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.325 | Weights_l2 --> 9059.744 | Lr --> 0.006 | Seconds_per_step --> 3.390 | [2024-08-10 20:53:57,081][Main][INFO] - [train] Step 34100 out of 80000 | Loss --> 1.950 | Grad_l2 --> 0.323 | Weights_l2 --> 9060.114 | Lr --> 0.006 | Seconds_per_step --> 3.608 | [2024-08-10 20:57:04,451][Main][INFO] - [train] Step 34150 out of 80000 | Loss --> 1.945 | Grad_l2 --> 0.322 | Weights_l2 --> 9060.487 | Lr --> 0.006 | Seconds_per_step --> 3.747 | [2024-08-10 20:59:52,938][Main][INFO] - [train] Step 34200 out of 80000 | Loss --> 1.944 | Grad_l2 --> 0.323 | Weights_l2 --> 9060.857 | Lr --> 0.006 | Seconds_per_step --> 3.370 | [2024-08-10 21:02:41,472][Main][INFO] - [train] Step 34250 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.327 | Weights_l2 --> 9061.209 | Lr --> 0.006 | Seconds_per_step --> 3.371 | [2024-08-10 21:05:30,147][Main][INFO] - [train] Step 34300 out of 80000 | Loss --> 1.943 | Grad_l2 --> 0.323 | Weights_l2 --> 9061.579 | Lr --> 0.006 | Seconds_per_step --> 3.373 | [2024-08-10 21:08:19,894][Main][INFO] - [train] Step 34350 out of 80000 | Loss --> 1.949 | Grad_l2 --> 0.323 | Weights_l2 --> 9061.930 | Lr --> 0.006 | Seconds_per_step --> 3.395 | [2024-08-10 21:11:08,680][Main][INFO] - [train] Step 34400 out of 80000 | Loss --> 1.954 | Grad_l2 --> 0.324 | Weights_l2 --> 9062.305 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 21:13:58,108][Main][INFO] - [train] Step 34450 out of 80000 | Loss --> 1.950 | Grad_l2 --> 0.323 | Weights_l2 --> 9062.682 | Lr --> 0.006 | Seconds_per_step --> 3.389 | [2024-08-10 21:16:57,011][Main][INFO] - [train] Step 34500 out of 80000 | Loss --> 1.953 | Grad_l2 --> 0.322 | Weights_l2 --> 9063.034 | Lr --> 0.006 | Seconds_per_step --> 3.578 | [2024-08-10 21:20:03,124][Main][INFO] - [train] Step 34550 out of 80000 | Loss --> 1.956 | Grad_l2 --> 0.323 | Weights_l2 --> 9063.400 | Lr --> 0.006 | Seconds_per_step --> 3.722 | [2024-08-10 21:22:52,395][Main][INFO] - [train] Step 34600 out of 80000 | Loss --> 1.951 | Grad_l2 --> 0.322 | Weights_l2 --> 9063.733 | Lr --> 0.006 | Seconds_per_step --> 3.385 | [2024-08-10 21:25:41,187][Main][INFO] - [train] Step 34650 out of 80000 | Loss --> 1.949 | Grad_l2 --> 0.326 | Weights_l2 --> 9064.106 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 21:28:30,675][Main][INFO] - [train] Step 34700 out of 80000 | Loss --> 1.965 | Grad_l2 --> 0.326 | Weights_l2 --> 9064.465 | Lr --> 0.006 | Seconds_per_step --> 3.390 | [2024-08-10 21:31:20,544][Main][INFO] - [train] Step 34750 out of 80000 | Loss --> 1.953 | Grad_l2 --> 0.322 | Weights_l2 --> 9064.833 | Lr --> 0.006 | Seconds_per_step --> 3.397 | [2024-08-10 21:34:09,343][Main][INFO] - [train] Step 34800 out of 80000 | Loss --> 1.959 | Grad_l2 --> 0.323 | Weights_l2 --> 9065.209 | Lr --> 0.006 | Seconds_per_step --> 3.376 | [2024-08-10 21:37:00,873][Main][INFO] - [train] Step 34850 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.326 | Weights_l2 --> 9065.547 | Lr --> 0.006 | Seconds_per_step --> 3.431 | [2024-08-10 21:39:50,852][Main][INFO] - [train] Step 34900 out of 80000 | Loss --> 1.970 | Grad_l2 --> 0.324 | Weights_l2 --> 9065.911 | Lr --> 0.006 | Seconds_per_step --> 3.400 | [2024-08-10 21:42:40,224][Main][INFO] - [train] Step 34950 out of 80000 | Loss --> 1.959 | Grad_l2 --> 0.324 | Weights_l2 --> 9066.270 | Lr --> 0.006 | Seconds_per_step --> 3.387 | [2024-08-10 21:45:37,170][Main][INFO] - [train] Step 35000 out of 80000 | Loss --> 1.963 | Grad_l2 --> 0.322 | Weights_l2 --> 9066.630 | Lr --> 0.006 | Seconds_per_step --> 3.539 | [2024-08-10 21:45:37,170][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-35000 [2024-08-10 21:45:37,174][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 21:45:39,174][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-35000/model.safetensors [2024-08-10 21:45:41,920][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-35000/optimizer.bin [2024-08-10 21:45:41,920][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-35000/scheduler.bin [2024-08-10 21:45:41,920][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-35000/sampler.bin [2024-08-10 21:45:41,920][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-35000/sampler_1.bin [2024-08-10 21:45:41,921][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-35000/random_states_0.pkl [2024-08-10 21:48:30,259][Main][INFO] - [train] Step 35050 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.328 | Weights_l2 --> 9066.983 | Lr --> 0.006 | Seconds_per_step --> 3.462 | [2024-08-10 21:51:18,768][Main][INFO] - [train] Step 35100 out of 80000 | Loss --> 1.976 | Grad_l2 --> 0.324 | Weights_l2 --> 9067.320 | Lr --> 0.006 | Seconds_per_step --> 3.370 | [2024-08-10 21:54:07,440][Main][INFO] - [train] Step 35150 out of 80000 | Loss --> 1.973 | Grad_l2 --> 0.322 | Weights_l2 --> 9067.649 | Lr --> 0.006 | Seconds_per_step --> 3.373 | [2024-08-10 21:56:56,909][Main][INFO] - [train] Step 35200 out of 80000 | Loss --> 1.974 | Grad_l2 --> 0.327 | Weights_l2 --> 9068.010 | Lr --> 0.006 | Seconds_per_step --> 3.389 | [2024-08-10 21:59:46,025][Main][INFO] - [train] Step 35250 out of 80000 | Loss --> 1.975 | Grad_l2 --> 0.325 | Weights_l2 --> 9068.355 | Lr --> 0.006 | Seconds_per_step --> 3.382 | [2024-08-10 22:02:34,755][Main][INFO] - [train] Step 35300 out of 80000 | Loss --> 1.974 | Grad_l2 --> 0.325 | Weights_l2 --> 9068.705 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 22:05:24,758][Main][INFO] - [train] Step 35350 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.327 | Weights_l2 --> 9069.051 | Lr --> 0.006 | Seconds_per_step --> 3.400 | [2024-08-10 22:08:13,479][Main][INFO] - [train] Step 35400 out of 80000 | Loss --> 1.974 | Grad_l2 --> 0.324 | Weights_l2 --> 9069.396 | Lr --> 0.006 | Seconds_per_step --> 3.374 | [2024-08-10 22:11:02,521][Main][INFO] - [train] Step 35450 out of 80000 | Loss --> 1.969 | Grad_l2 --> 0.323 | Weights_l2 --> 9069.720 | Lr --> 0.006 | Seconds_per_step --> 3.381 | [2024-08-10 22:13:50,439][Main][INFO] - [train] Step 35500 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.322 | Weights_l2 --> 9070.047 | Lr --> 0.006 | Seconds_per_step --> 3.358 | [2024-08-10 22:16:39,106][Main][INFO] - [train] Step 35550 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.324 | Weights_l2 --> 9070.389 | Lr --> 0.006 | Seconds_per_step --> 3.373 | [2024-08-10 22:19:27,802][Main][INFO] - [train] Step 35600 out of 80000 | Loss --> 1.979 | Grad_l2 --> 0.327 | Weights_l2 --> 9070.728 | Lr --> 0.006 | Seconds_per_step --> 3.374 | [2024-08-10 22:22:16,572][Main][INFO] - [train] Step 35650 out of 80000 | Loss --> 1.981 | Grad_l2 --> 0.325 | Weights_l2 --> 9071.049 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 22:25:04,796][Main][INFO] - [train] Step 35700 out of 80000 | Loss --> 1.984 | Grad_l2 --> 0.325 | Weights_l2 --> 9071.379 | Lr --> 0.006 | Seconds_per_step --> 3.364 | [2024-08-10 22:27:51,592][Main][INFO] - [train] Step 35750 out of 80000 | Loss --> 1.990 | Grad_l2 --> 0.322 | Weights_l2 --> 9071.695 | Lr --> 0.006 | Seconds_per_step --> 3.336 | [2024-08-10 22:30:40,355][Main][INFO] - [train] Step 35800 out of 80000 | Loss --> 1.978 | Grad_l2 --> 0.323 | Weights_l2 --> 9072.031 | Lr --> 0.006 | Seconds_per_step --> 3.375 | [2024-08-10 22:33:28,954][Main][INFO] - [train] Step 35850 out of 80000 | Loss --> 1.988 | Grad_l2 --> 0.320 | Weights_l2 --> 9072.335 | Lr --> 0.006 | Seconds_per_step --> 3.372 | [2024-08-10 22:36:18,068][Main][INFO] - [train] Step 35900 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.324 | Weights_l2 --> 9072.654 | Lr --> 0.006 | Seconds_per_step --> 3.382 | [2024-08-10 22:39:07,221][Main][INFO] - [train] Step 35950 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.321 | Weights_l2 --> 9072.975 | Lr --> 0.006 | Seconds_per_step --> 3.383 | [2024-08-10 22:41:56,739][Main][INFO] - [train] Step 36000 out of 80000 | Loss --> 1.981 | Grad_l2 --> 0.320 | Weights_l2 --> 9073.295 | Lr --> 0.006 | Seconds_per_step --> 3.390 | [2024-08-10 22:44:45,142][Main][INFO] - [train] Step 36050 out of 80000 | Loss --> 1.982 | Grad_l2 --> 0.324 | Weights_l2 --> 9073.610 | Lr --> 0.006 | Seconds_per_step --> 3.368 | [2024-08-10 22:47:33,362][Main][INFO] - [train] Step 36100 out of 80000 | Loss --> 1.981 | Grad_l2 --> 0.321 | Weights_l2 --> 9073.917 | Lr --> 0.006 | Seconds_per_step --> 3.364 | [2024-08-10 22:50:22,054][Main][INFO] - [train] Step 36150 out of 80000 | Loss --> 1.984 | Grad_l2 --> 0.323 | Weights_l2 --> 9074.221 | Lr --> 0.006 | Seconds_per_step --> 3.374 | [2024-08-10 22:53:11,318][Main][INFO] - [train] Step 36200 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.538 | Lr --> 0.006 | Seconds_per_step --> 3.385 | [2024-08-10 22:55:59,443][Main][INFO] - [train] Step 36250 out of 80000 | Loss --> 1.980 | Grad_l2 --> 0.323 | Weights_l2 --> 9074.854 | Lr --> 0.006 | Seconds_per_step --> 3.363 | [2024-08-10 22:58:47,752][Main][INFO] - [train] Step 36300 out of 80000 | Loss --> 1.980 | Grad_l2 --> 0.322 | Weights_l2 --> 9075.159 | Lr --> 0.006 | Seconds_per_step --> 3.366 | [2024-08-10 23:01:36,450][Main][INFO] - [train] Step 36350 out of 80000 | Loss --> 1.992 | Grad_l2 --> 0.320 | Weights_l2 --> 9075.441 | Lr --> 0.006 | Seconds_per_step --> 3.374 | [2024-08-10 23:04:25,574][Main][INFO] - [train] Step 36400 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.321 | Weights_l2 --> 9075.748 | Lr --> 0.006 | Seconds_per_step --> 3.382 | [2024-08-10 23:07:14,442][Main][INFO] - [train] Step 36450 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.319 | Weights_l2 --> 9076.036 | Lr --> 0.006 | Seconds_per_step --> 3.377 | [2024-08-10 23:10:04,537][Main][INFO] - [train] Step 36500 out of 80000 | Loss --> 1.992 | Grad_l2 --> 0.323 | Weights_l2 --> 9076.326 | Lr --> 0.005 | Seconds_per_step --> 3.402 | [2024-08-10 23:12:52,857][Main][INFO] - [train] Step 36550 out of 80000 | Loss --> 1.986 | Grad_l2 --> 0.325 | Weights_l2 --> 9076.642 | Lr --> 0.005 | Seconds_per_step --> 3.366 | [2024-08-10 23:15:41,926][Main][INFO] - [train] Step 36600 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.938 | Lr --> 0.005 | Seconds_per_step --> 3.381 | [2024-08-10 23:18:31,293][Main][INFO] - [train] Step 36650 out of 80000 | Loss --> 1.986 | Grad_l2 --> 0.321 | Weights_l2 --> 9077.225 | Lr --> 0.005 | Seconds_per_step --> 3.387 | [2024-08-10 23:21:20,030][Main][INFO] - [train] Step 36700 out of 80000 | Loss --> 1.996 | Grad_l2 --> 0.318 | Weights_l2 --> 9077.501 | Lr --> 0.005 | Seconds_per_step --> 3.375 | [2024-08-10 23:24:08,655][Main][INFO] - [train] Step 36750 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.321 | Weights_l2 --> 9077.785 | Lr --> 0.005 | Seconds_per_step --> 3.372 | [2024-08-10 23:26:59,534][Main][INFO] - [train] Step 36800 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.326 | Weights_l2 --> 9078.068 | Lr --> 0.005 | Seconds_per_step --> 3.418 | [2024-08-10 23:29:48,887][Main][INFO] - [train] Step 36850 out of 80000 | Loss --> 1.988 | Grad_l2 --> 0.321 | Weights_l2 --> 9078.353 | Lr --> 0.005 | Seconds_per_step --> 3.387 | [2024-08-10 23:32:38,332][Main][INFO] - [train] Step 36900 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.321 | Weights_l2 --> 9078.633 | Lr --> 0.005 | Seconds_per_step --> 3.389 | [2024-08-10 23:35:27,561][Main][INFO] - [train] Step 36950 out of 80000 | Loss --> 1.997 | Grad_l2 --> 0.322 | Weights_l2 --> 9078.910 | Lr --> 0.005 | Seconds_per_step --> 3.385 | [2024-08-10 23:38:17,143][Main][INFO] - [train] Step 37000 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.319 | Weights_l2 --> 9079.184 | Lr --> 0.005 | Seconds_per_step --> 3.392 | [2024-08-10 23:41:05,847][Main][INFO] - [train] Step 37050 out of 80000 | Loss --> 1.978 | Grad_l2 --> 0.318 | Weights_l2 --> 9079.459 | Lr --> 0.005 | Seconds_per_step --> 3.374 | [2024-08-10 23:43:54,811][Main][INFO] - [train] Step 37100 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.320 | Weights_l2 --> 9079.752 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-10 23:46:43,768][Main][INFO] - [train] Step 37150 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.321 | Weights_l2 --> 9080.034 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-10 23:49:32,160][Main][INFO] - [train] Step 37200 out of 80000 | Loss --> 1.994 | Grad_l2 --> 0.323 | Weights_l2 --> 9080.289 | Lr --> 0.005 | Seconds_per_step --> 3.368 | [2024-08-10 23:52:20,720][Main][INFO] - [train] Step 37250 out of 80000 | Loss --> 1.986 | Grad_l2 --> 0.315 | Weights_l2 --> 9080.560 | Lr --> 0.005 | Seconds_per_step --> 3.371 | [2024-08-10 23:55:08,868][Main][INFO] - [train] Step 37300 out of 80000 | Loss --> 1.996 | Grad_l2 --> 0.321 | Weights_l2 --> 9080.819 | Lr --> 0.005 | Seconds_per_step --> 3.363 | [2024-08-10 23:57:58,146][Main][INFO] - [train] Step 37350 out of 80000 | Loss --> 1.980 | Grad_l2 --> 0.318 | Weights_l2 --> 9081.084 | Lr --> 0.005 | Seconds_per_step --> 3.386 | [2024-08-11 00:00:46,523][Main][INFO] - [train] Step 37400 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.318 | Weights_l2 --> 9081.347 | Lr --> 0.005 | Seconds_per_step --> 3.368 | [2024-08-11 00:03:35,784][Main][INFO] - [train] Step 37450 out of 80000 | Loss --> 1.988 | Grad_l2 --> 0.318 | Weights_l2 --> 9081.604 | Lr --> 0.005 | Seconds_per_step --> 3.385 | [2024-08-11 00:06:24,722][Main][INFO] - [train] Step 37500 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.318 | Weights_l2 --> 9081.851 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-11 00:09:13,944][Main][INFO] - [train] Step 37550 out of 80000 | Loss --> 1.980 | Grad_l2 --> 0.319 | Weights_l2 --> 9082.097 | Lr --> 0.005 | Seconds_per_step --> 3.384 | [2024-08-11 00:12:02,407][Main][INFO] - [train] Step 37600 out of 80000 | Loss --> 1.995 | Grad_l2 --> 0.316 | Weights_l2 --> 9082.342 | Lr --> 0.005 | Seconds_per_step --> 3.369 | [2024-08-11 00:14:51,432][Main][INFO] - [train] Step 37650 out of 80000 | Loss --> 1.992 | Grad_l2 --> 0.318 | Weights_l2 --> 9082.588 | Lr --> 0.005 | Seconds_per_step --> 3.380 | [2024-08-11 00:17:39,935][Main][INFO] - [train] Step 37700 out of 80000 | Loss --> 1.994 | Grad_l2 --> 0.315 | Weights_l2 --> 9082.847 | Lr --> 0.005 | Seconds_per_step --> 3.370 | [2024-08-11 00:20:28,827][Main][INFO] - [train] Step 37750 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.320 | Weights_l2 --> 9083.095 | Lr --> 0.005 | Seconds_per_step --> 3.378 | [2024-08-11 00:23:16,528][Main][INFO] - [train] Step 37800 out of 80000 | Loss --> 1.995 | Grad_l2 --> 0.318 | Weights_l2 --> 9083.336 | Lr --> 0.005 | Seconds_per_step --> 3.354 | [2024-08-11 00:26:04,576][Main][INFO] - [train] Step 37850 out of 80000 | Loss --> 1.993 | Grad_l2 --> 0.320 | Weights_l2 --> 9083.585 | Lr --> 0.005 | Seconds_per_step --> 3.361 | [2024-08-11 00:28:53,091][Main][INFO] - [train] Step 37900 out of 80000 | Loss --> 1.991 | Grad_l2 --> 0.317 | Weights_l2 --> 9083.812 | Lr --> 0.005 | Seconds_per_step --> 3.370 | [2024-08-11 00:31:41,481][Main][INFO] - [train] Step 37950 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.315 | Weights_l2 --> 9084.061 | Lr --> 0.005 | Seconds_per_step --> 3.368 | [2024-08-11 00:34:29,788][Main][INFO] - [train] Step 38000 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.315 | Weights_l2 --> 9084.285 | Lr --> 0.005 | Seconds_per_step --> 3.366 | [2024-08-11 00:37:19,033][Main][INFO] - [train] Step 38050 out of 80000 | Loss --> 1.985 | Grad_l2 --> 0.315 | Weights_l2 --> 9084.521 | Lr --> 0.005 | Seconds_per_step --> 3.385 | [2024-08-11 00:40:06,692][Main][INFO] - [train] Step 38100 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.314 | Weights_l2 --> 9084.736 | Lr --> 0.005 | Seconds_per_step --> 3.353 | [2024-08-11 00:42:54,429][Main][INFO] - [train] Step 38150 out of 80000 | Loss --> 1.984 | Grad_l2 --> 0.317 | Weights_l2 --> 9084.960 | Lr --> 0.005 | Seconds_per_step --> 3.355 | [2024-08-11 00:45:43,227][Main][INFO] - [train] Step 38200 out of 80000 | Loss --> 1.978 | Grad_l2 --> 0.315 | Weights_l2 --> 9085.192 | Lr --> 0.005 | Seconds_per_step --> 3.376 | [2024-08-11 00:48:31,813][Main][INFO] - [train] Step 38250 out of 80000 | Loss --> 1.989 | Grad_l2 --> 0.316 | Weights_l2 --> 9085.401 | Lr --> 0.005 | Seconds_per_step --> 3.372 | [2024-08-11 00:51:19,799][Main][INFO] - [train] Step 38300 out of 80000 | Loss --> 1.987 | Grad_l2 --> 0.316 | Weights_l2 --> 9085.644 | Lr --> 0.005 | Seconds_per_step --> 3.360 | [2024-08-11 00:54:08,857][Main][INFO] - [train] Step 38350 out of 80000 | Loss --> 1.983 | Grad_l2 --> 0.315 | Weights_l2 --> 9085.876 | Lr --> 0.005 | Seconds_per_step --> 3.381 | [2024-08-11 00:56:56,916][Main][INFO] - [train] Step 38400 out of 80000 | Loss --> 1.984 | Grad_l2 --> 0.313 | Weights_l2 --> 9086.108 | Lr --> 0.005 | Seconds_per_step --> 3.361 | [2024-08-11 00:59:46,820][Main][INFO] - [train] Step 38450 out of 80000 | Loss --> 1.976 | Grad_l2 --> 0.313 | Weights_l2 --> 9086.323 | Lr --> 0.005 | Seconds_per_step --> 3.398 | [2024-08-11 01:02:36,195][Main][INFO] - [train] Step 38500 out of 80000 | Loss --> 1.969 | Grad_l2 --> 0.315 | Weights_l2 --> 9086.523 | Lr --> 0.005 | Seconds_per_step --> 3.387 | [2024-08-11 01:05:25,092][Main][INFO] - [train] Step 38550 out of 80000 | Loss --> 1.971 | Grad_l2 --> 0.313 | Weights_l2 --> 9086.744 | Lr --> 0.005 | Seconds_per_step --> 3.378 | [2024-08-11 01:08:13,931][Main][INFO] - [train] Step 38600 out of 80000 | Loss --> 1.969 | Grad_l2 --> 0.313 | Weights_l2 --> 9086.958 | Lr --> 0.005 | Seconds_per_step --> 3.377 | [2024-08-11 01:11:03,451][Main][INFO] - [train] Step 38650 out of 80000 | Loss --> 1.972 | Grad_l2 --> 0.310 | Weights_l2 --> 9087.155 | Lr --> 0.005 | Seconds_per_step --> 3.390 | [2024-08-11 01:13:52,933][Main][INFO] - [train] Step 38700 out of 80000 | Loss --> 1.969 | Grad_l2 --> 0.311 | Weights_l2 --> 9087.358 | Lr --> 0.005 | Seconds_per_step --> 3.390 | [2024-08-11 01:16:41,729][Main][INFO] - [train] Step 38750 out of 80000 | Loss --> 1.965 | Grad_l2 --> 0.312 | Weights_l2 --> 9087.569 | Lr --> 0.005 | Seconds_per_step --> 3.376 | [2024-08-11 01:19:30,203][Main][INFO] - [train] Step 38800 out of 80000 | Loss --> 1.975 | Grad_l2 --> 0.312 | Weights_l2 --> 9087.770 | Lr --> 0.005 | Seconds_per_step --> 3.369 | [2024-08-11 01:22:18,941][Main][INFO] - [train] Step 38850 out of 80000 | Loss --> 1.961 | Grad_l2 --> 0.312 | Weights_l2 --> 9087.988 | Lr --> 0.005 | Seconds_per_step --> 3.375 | [2024-08-11 01:25:08,309][Main][INFO] - [train] Step 38900 out of 80000 | Loss --> 1.955 | Grad_l2 --> 0.311 | Weights_l2 --> 9088.181 | Lr --> 0.005 | Seconds_per_step --> 3.387 | [2024-08-11 01:27:57,134][Main][INFO] - [train] Step 38950 out of 80000 | Loss --> 1.954 | Grad_l2 --> 0.311 | Weights_l2 --> 9088.378 | Lr --> 0.005 | Seconds_per_step --> 3.376 | [2024-08-11 01:30:46,945][Main][INFO] - [train] Step 39000 out of 80000 | Loss --> 1.964 | Grad_l2 --> 0.311 | Weights_l2 --> 9088.599 | Lr --> 0.005 | Seconds_per_step --> 3.396 | [2024-08-11 01:33:35,903][Main][INFO] - [train] Step 39050 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.311 | Weights_l2 --> 9088.786 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-11 01:36:25,983][Main][INFO] - [train] Step 39100 out of 80000 | Loss --> 1.963 | Grad_l2 --> 0.312 | Weights_l2 --> 9088.981 | Lr --> 0.005 | Seconds_per_step --> 3.402 | [2024-08-11 01:39:15,477][Main][INFO] - [train] Step 39150 out of 80000 | Loss --> 1.957 | Grad_l2 --> 0.310 | Weights_l2 --> 9089.179 | Lr --> 0.005 | Seconds_per_step --> 3.390 | [2024-08-11 01:42:04,924][Main][INFO] - [train] Step 39200 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.314 | Weights_l2 --> 9089.373 | Lr --> 0.005 | Seconds_per_step --> 3.389 | [2024-08-11 01:44:54,356][Main][INFO] - [train] Step 39250 out of 80000 | Loss --> 1.954 | Grad_l2 --> 0.310 | Weights_l2 --> 9089.562 | Lr --> 0.005 | Seconds_per_step --> 3.389 | [2024-08-11 01:47:45,286][Main][INFO] - [train] Step 39300 out of 80000 | Loss --> 1.949 | Grad_l2 --> 0.309 | Weights_l2 --> 9089.747 | Lr --> 0.005 | Seconds_per_step --> 3.419 | [2024-08-11 01:50:34,944][Main][INFO] - [train] Step 39350 out of 80000 | Loss --> 1.957 | Grad_l2 --> 0.312 | Weights_l2 --> 9089.936 | Lr --> 0.005 | Seconds_per_step --> 3.393 | [2024-08-11 01:53:24,603][Main][INFO] - [train] Step 39400 out of 80000 | Loss --> 1.962 | Grad_l2 --> 0.312 | Weights_l2 --> 9090.121 | Lr --> 0.005 | Seconds_per_step --> 3.393 | [2024-08-11 01:56:14,325][Main][INFO] - [train] Step 39450 out of 80000 | Loss --> 1.950 | Grad_l2 --> 0.314 | Weights_l2 --> 9090.300 | Lr --> 0.005 | Seconds_per_step --> 3.394 | [2024-08-11 01:59:04,661][Main][INFO] - [train] Step 39500 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.310 | Weights_l2 --> 9090.498 | Lr --> 0.005 | Seconds_per_step --> 3.407 | [2024-08-11 02:01:53,842][Main][INFO] - [train] Step 39550 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.315 | Weights_l2 --> 9090.693 | Lr --> 0.005 | Seconds_per_step --> 3.384 | [2024-08-11 02:04:42,792][Main][INFO] - [train] Step 39600 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.310 | Weights_l2 --> 9090.871 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-11 02:07:31,737][Main][INFO] - [train] Step 39650 out of 80000 | Loss --> 1.950 | Grad_l2 --> 0.311 | Weights_l2 --> 9091.055 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-11 02:10:22,230][Main][INFO] - [train] Step 39700 out of 80000 | Loss --> 1.947 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.222 | Lr --> 0.005 | Seconds_per_step --> 3.410 | [2024-08-11 02:13:12,097][Main][INFO] - [train] Step 39750 out of 80000 | Loss --> 1.947 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.387 | Lr --> 0.005 | Seconds_per_step --> 3.397 | [2024-08-11 02:16:01,033][Main][INFO] - [train] Step 39800 out of 80000 | Loss --> 1.952 | Grad_l2 --> 0.314 | Weights_l2 --> 9091.557 | Lr --> 0.005 | Seconds_per_step --> 3.379 | [2024-08-11 02:18:49,894][Main][INFO] - [train] Step 39850 out of 80000 | Loss --> 1.946 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.734 | Lr --> 0.005 | Seconds_per_step --> 3.377 | [2024-08-11 02:21:39,391][Main][INFO] - [train] Step 39900 out of 80000 | Loss --> 1.947 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.896 | Lr --> 0.005 | Seconds_per_step --> 3.390 | [2024-08-11 02:24:29,042][Main][INFO] - [train] Step 39950 out of 80000 | Loss --> 1.948 | Grad_l2 --> 0.309 | Weights_l2 --> 9092.072 | Lr --> 0.005 | Seconds_per_step --> 3.393 | [2024-08-11 02:27:18,462][Main][INFO] - [train] Step 40000 out of 80000 | Loss --> 1.945 | Grad_l2 --> 0.308 | Weights_l2 --> 9092.238 | Lr --> 0.005 | Seconds_per_step --> 3.388 | [2024-08-11 02:27:18,462][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-40000 [2024-08-11 02:27:18,465][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-11 02:27:20,578][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-40000/model.safetensors [2024-08-11 02:27:23,421][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-40000/optimizer.bin [2024-08-11 02:27:23,421][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-40000/scheduler.bin [2024-08-11 02:27:23,421][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-40000/sampler.bin [2024-08-11 02:27:23,421][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-40000/sampler_1.bin [2024-08-11 02:27:23,422][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-40000/random_states_0.pkl [2024-08-11 02:30:12,274][Main][INFO] - [train] Step 40050 out of 80000 | Loss --> 1.938 | Grad_l2 --> 0.311 | Weights_l2 --> 9092.394 | Lr --> 0.005 | Seconds_per_step --> 3.476 | [2024-08-11 02:33:02,978][Main][INFO] - [train] Step 40100 out of 80000 | Loss --> 1.936 | Grad_l2 --> 0.313 | Weights_l2 --> 9092.562 | Lr --> 0.005 | Seconds_per_step --> 3.414 | [2024-08-11 02:35:52,303][Main][INFO] - [train] Step 40150 out of 80000 | Loss --> 1.945 | Grad_l2 --> 0.309 | Weights_l2 --> 9092.724 | Lr --> 0.005 | Seconds_per_step --> 3.386 | [2024-08-11 02:38:41,408][Main][INFO] - [train] Step 40200 out of 80000 | Loss --> 1.940 | Grad_l2 --> 0.310 | Weights_l2 --> 9092.882 | Lr --> 0.005 | Seconds_per_step --> 3.382 | [2024-08-11 02:41:29,521][Main][INFO] - [train] Step 40250 out of 80000 | Loss --> 1.935 | Grad_l2 --> 0.310 | Weights_l2 --> 9093.039 | Lr --> 0.005 | Seconds_per_step --> 3.362 | [2024-08-11 02:44:17,817][Main][INFO] - [train] Step 40300 out of 80000 | Loss --> 1.928 | Grad_l2 --> 0.311 | Weights_l2 --> 9093.200 | Lr --> 0.005 | Seconds_per_step --> 3.366 | [2024-08-11 02:47:06,404][Main][INFO] - [train] Step 40350 out of 80000 | Loss --> 1.934 | Grad_l2 --> 0.309 | Weights_l2 --> 9093.363 | Lr --> 0.005 | Seconds_per_step --> 3.372 | [2024-08-11 02:49:54,901][Main][INFO] - [train] Step 40400 out of 80000 | Loss --> 1.930 | Grad_l2 --> 0.309 | Weights_l2 --> 9093.498 | Lr --> 0.005 | Seconds_per_step --> 3.370 | [2024-08-11 02:52:45,475][Main][INFO] - [train] Step 40450 out of 80000 | Loss --> 1.936 | Grad_l2 --> 0.311 | Weights_l2 --> 9093.665 | Lr --> 0.005 | Seconds_per_step --> 3.411 | [2024-08-11 02:55:35,053][Main][INFO] - [train] Step 40500 out of 80000 | Loss --> 1.919 | Grad_l2 --> 0.309 | Weights_l2 --> 9093.834 | Lr --> 0.005 | Seconds_per_step --> 3.392 | [2024-08-11 02:58:25,674][Main][INFO] - [train] Step 40550 out of 80000 | Loss --> 1.927 | Grad_l2 --> 0.308 | Weights_l2 --> 9093.998 | Lr --> 0.005 | Seconds_per_step --> 3.412 | [2024-08-11 03:01:14,558][Main][INFO] - [train] Step 40600 out of 80000 | Loss --> 1.933 | Grad_l2 --> 0.306 | Weights_l2 --> 9094.139 | Lr --> 0.005 | Seconds_per_step --> 3.378 | [2024-08-11 03:04:03,662][Main][INFO] - [train] Step 40650 out of 80000 | Loss --> 1.927 | Grad_l2 --> 0.308 | Weights_l2 --> 9094.289 | Lr --> 0.005 | Seconds_per_step --> 3.382 | [2024-08-11 03:06:52,826][Main][INFO] - [train] Step 40700 out of 80000 | Loss --> 1.912 | Grad_l2 --> 0.308 | Weights_l2 --> 9094.446 | Lr --> 0.005 | Seconds_per_step --> 3.383 | [2024-08-11 03:09:42,887][Main][INFO] - [train] Step 40750 out of 80000 | Loss --> 1.911 | Grad_l2 --> 0.309 | Weights_l2 --> 9094.589 | Lr --> 0.005 | Seconds_per_step --> 3.401 | [2024-08-11 03:12:31,609][Main][INFO] - [train] Step 40800 out of 80000 | Loss --> 1.918 | Grad_l2 --> 0.311 | Weights_l2 --> 9094.736 | Lr --> 0.005 | Seconds_per_step --> 3.374 | [2024-08-11 03:15:18,925][Main][INFO] - [train] Step 40850 out of 80000 | Loss --> 1.915 | Grad_l2 --> 0.309 | Weights_l2 --> 9094.879 | Lr --> 0.005 | Seconds_per_step --> 3.346 | [2024-08-11 03:18:07,730][Main][INFO] - [train] Step 40900 out of 80000 | Loss --> 1.910 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.030 | Lr --> 0.005 | Seconds_per_step --> 3.376 | [2024-08-11 03:21:01,061][Main][INFO] - [train] Step 40950 out of 80000 | Loss --> 1.916 | Grad_l2 --> 0.304 | Weights_l2 --> 9095.165 | Lr --> 0.005 | Seconds_per_step --> 3.467 | [2024-08-11 03:23:50,091][Main][INFO] - [train] Step 41000 out of 80000 | Loss --> 1.912 | Grad_l2 --> 0.307 | Weights_l2 --> 9095.298 | Lr --> 0.005 | Seconds_per_step --> 3.381 | [2024-08-11 03:26:38,484][Main][INFO] - [train] Step 41050 out of 80000 | Loss --> 1.926 | Grad_l2 --> 0.308 | Weights_l2 --> 9095.436 | Lr --> 0.005 | Seconds_per_step --> 3.368 | [2024-08-11 03:29:26,707][Main][INFO] - [train] Step 41100 out of 80000 | Loss --> 1.916 | Grad_l2 --> 0.309 | Weights_l2 --> 9095.571 | Lr --> 0.005 | Seconds_per_step --> 3.364 | [2024-08-11 03:32:17,300][Main][INFO] - [train] Step 41150 out of 80000 | Loss --> 1.914 | Grad_l2 --> 0.308 | Weights_l2 --> 9095.706 | Lr --> 0.005 | Seconds_per_step --> 3.412 | [2024-08-11 03:35:06,978][Main][INFO] - [train] Step 41200 out of 80000 | Loss --> 1.914 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.831 | Lr --> 0.005 | Seconds_per_step --> 3.394 | [2024-08-11 03:37:56,000][Main][INFO] - [train] Step 41250 out of 80000 | Loss --> 1.914 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.964 | Lr --> 0.005 | Seconds_per_step --> 3.380 | [2024-08-11 03:40:45,631][Main][INFO] - [train] Step 41300 out of 80000 | Loss --> 1.912 | Grad_l2 --> 0.306 | Weights_l2 --> 9096.088 | Lr --> 0.005 | Seconds_per_step --> 3.393 | [2024-08-11 03:43:35,918][Main][INFO] - [train] Step 41350 out of 80000 | Loss --> 1.898 | Grad_l2 --> 0.304 | Weights_l2 --> 9096.215 | Lr --> 0.005 | Seconds_per_step --> 3.406 | [2024-08-11 03:46:26,811][Main][INFO] - [train] Step 41400 out of 80000 | Loss --> 1.899 | Grad_l2 --> 0.306 | Weights_l2 --> 9096.371 | Lr --> 0.005 | Seconds_per_step --> 3.418 | [2024-08-11 03:49:16,649][Main][INFO] - [train] Step 41450 out of 80000 | Loss --> 1.909 | Grad_l2 --> 0.305 | Weights_l2 --> 9096.494 | Lr --> 0.005 | Seconds_per_step --> 3.397 | [2024-08-11 03:52:06,340][Main][INFO] - [train] Step 41500 out of 80000 | Loss --> 1.900 | Grad_l2 --> 0.304 | Weights_l2 --> 9096.611 | Lr --> 0.005 | Seconds_per_step --> 3.394 | [2024-08-11 03:54:56,488][Main][INFO] - [train] Step 41550 out of 80000 | Loss --> 1.900 | Grad_l2 --> 0.305 | Weights_l2 --> 9096.724 | Lr --> 0.005 | Seconds_per_step --> 3.403 | [2024-08-11 03:57:46,246][Main][INFO] - [train] Step 41600 out of 80000 | Loss --> 1.904 | Grad_l2 --> 0.306 | Weights_l2 --> 9096.859 | Lr --> 0.005 | Seconds_per_step --> 3.395 | [2024-08-11 04:00:36,209][Main][INFO] - [train] Step 41650 out of 80000 | Loss --> 1.900 | Grad_l2 --> 0.305 | Weights_l2 --> 9096.976 | Lr --> 0.005 | Seconds_per_step --> 3.399 | [2024-08-11 04:03:26,087][Main][INFO] - [train] Step 41700 out of 80000 | Loss --> 1.896 | Grad_l2 --> 0.306 | Weights_l2 --> 9097.099 | Lr --> 0.005 | Seconds_per_step --> 3.398 | [2024-08-11 04:06:16,494][Main][INFO] - [train] Step 41750 out of 80000 | Loss --> 1.906 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.204 | Lr --> 0.005 | Seconds_per_step --> 3.408 | [2024-08-11 04:09:06,107][Main][INFO] - [train] Step 41800 out of 80000 | Loss --> 1.902 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.322 | Lr --> 0.005 | Seconds_per_step --> 3.392 | [2024-08-11 04:11:54,877][Main][INFO] - [train] Step 41850 out of 80000 | Loss --> 1.892 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.438 | Lr --> 0.005 | Seconds_per_step --> 3.375 | [2024-08-11 04:14:45,250][Main][INFO] - [train] Step 41900 out of 80000 | Loss --> 1.893 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.555 | Lr --> 0.005 | Seconds_per_step --> 3.407 | [2024-08-11 04:17:33,003][Main][INFO] - [train] Step 41950 out of 80000 | Loss --> 1.886 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.670 | Lr --> 0.005 | Seconds_per_step --> 3.355 | [2024-08-11 04:20:23,408][Main][INFO] - [train] Step 42000 out of 80000 | Loss --> 1.895 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.786 | Lr --> 0.005 | Seconds_per_step --> 3.408 | [2024-08-11 04:23:12,871][Main][INFO] - [train] Step 42050 out of 80000 | Loss --> 1.890 | Grad_l2 --> 0.303 | Weights_l2 --> 9097.903 | Lr --> 0.005 | Seconds_per_step --> 3.389 | [2024-08-11 04:26:10,451][Main][INFO] - [train] Step 42100 out of 80000 | Loss --> 1.887 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.995 | Lr --> 0.005 | Seconds_per_step --> 3.552 | [2024-08-11 04:29:00,301][Main][INFO] - [train] Step 42150 out of 80000 | Loss --> 1.886 | Grad_l2 --> 0.303 | Weights_l2 --> 9098.083 | Lr --> 0.005 | Seconds_per_step --> 3.397 | [2024-08-11 04:31:50,543][Main][INFO] - [train] Step 42200 out of 80000 | Loss --> 1.885 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.170 | Lr --> 0.005 | Seconds_per_step --> 3.405 | [2024-08-11 04:34:40,218][Main][INFO] - [train] Step 42250 out of 80000 | Loss --> 1.885 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.272 | Lr --> 0.004 | Seconds_per_step --> 3.393 | [2024-08-11 04:37:29,693][Main][INFO] - [train] Step 42300 out of 80000 | Loss --> 1.879 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.378 | Lr --> 0.004 | Seconds_per_step --> 3.389 | [2024-08-11 04:40:18,841][Main][INFO] - [train] Step 42350 out of 80000 | Loss --> 1.884 | Grad_l2 --> 0.306 | Weights_l2 --> 9098.505 | Lr --> 0.004 | Seconds_per_step --> 3.383 | [2024-08-11 04:43:07,106][Main][INFO] - [train] Step 42400 out of 80000 | Loss --> 1.887 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.599 | Lr --> 0.004 | Seconds_per_step --> 3.365 | [2024-08-11 04:45:54,958][Main][INFO] - [train] Step 42450 out of 80000 | Loss --> 1.890 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.697 | Lr --> 0.004 | Seconds_per_step --> 3.357 | [2024-08-11 04:48:42,930][Main][INFO] - [train] Step 42500 out of 80000 | Loss --> 1.889 | Grad_l2 --> 0.301 | Weights_l2 --> 9098.800 | Lr --> 0.004 | Seconds_per_step --> 3.359 | [2024-08-11 04:51:31,338][Main][INFO] - [train] Step 42550 out of 80000 | Loss --> 1.888 | Grad_l2 --> 0.302 | Weights_l2 --> 9098.884 | Lr --> 0.004 | Seconds_per_step --> 3.368 | [2024-08-11 04:54:20,337][Main][INFO] - [train] Step 42600 out of 80000 | Loss --> 1.885 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.972 | Lr --> 0.004 | Seconds_per_step --> 3.380 | [2024-08-11 04:57:07,121][Main][INFO] - [train] Step 42650 out of 80000 | Loss --> 1.881 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.064 | Lr --> 0.004 | Seconds_per_step --> 3.336 | [2024-08-11 04:59:54,453][Main][INFO] - [train] Step 42700 out of 80000 | Loss --> 1.886 | Grad_l2 --> 0.308 | Weights_l2 --> 9099.155 | Lr --> 0.004 | Seconds_per_step --> 3.347 | [2024-08-11 05:02:44,319][Main][INFO] - [train] Step 42750 out of 80000 | Loss --> 1.883 | Grad_l2 --> 0.301 | Weights_l2 --> 9099.248 | Lr --> 0.004 | Seconds_per_step --> 3.397 | [2024-08-11 05:05:34,713][Main][INFO] - [train] Step 42800 out of 80000 | Loss --> 1.889 | Grad_l2 --> 0.305 | Weights_l2 --> 9099.338 | Lr --> 0.004 | Seconds_per_step --> 3.408 | [2024-08-11 05:08:25,333][Main][INFO] - [train] Step 42850 out of 80000 | Loss --> 1.899 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.447 | Lr --> 0.004 | Seconds_per_step --> 3.412 | [2024-08-11 05:11:15,256][Main][INFO] - [train] Step 42900 out of 80000 | Loss --> 1.881 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.536 | Lr --> 0.004 | Seconds_per_step --> 3.398 | [2024-08-11 05:14:05,070][Main][INFO] - [train] Step 42950 out of 80000 | Loss --> 1.886 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.635 | Lr --> 0.004 | Seconds_per_step --> 3.396 | [2024-08-11 05:16:55,568][Main][INFO] - [train] Step 43000 out of 80000 | Loss --> 1.893 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.732 | Lr --> 0.004 | Seconds_per_step --> 3.410 | [2024-08-11 05:19:44,354][Main][INFO] - [train] Step 43050 out of 80000 | Loss --> 1.886 | Grad_l2 --> 0.301 | Weights_l2 --> 9099.800 | Lr --> 0.004 | Seconds_per_step --> 3.376 | [2024-08-11 05:22:33,664][Main][INFO] - [train] Step 43100 out of 80000 | Loss --> 1.882 | Grad_l2 --> 0.301 | Weights_l2 --> 9099.887 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 05:25:22,103][Main][INFO] - [train] Step 43150 out of 80000 | Loss --> 1.877 | Grad_l2 --> 0.306 | Weights_l2 --> 9099.968 | Lr --> 0.004 | Seconds_per_step --> 3.369 | [2024-08-11 05:28:12,205][Main][INFO] - [train] Step 43200 out of 80000 | Loss --> 1.881 | Grad_l2 --> 0.307 | Weights_l2 --> 9100.053 | Lr --> 0.004 | Seconds_per_step --> 3.402 | [2024-08-11 05:31:02,600][Main][INFO] - [train] Step 43250 out of 80000 | Loss --> 1.875 | Grad_l2 --> 0.307 | Weights_l2 --> 9100.120 | Lr --> 0.004 | Seconds_per_step --> 3.408 | [2024-08-11 05:33:51,662][Main][INFO] - [train] Step 43300 out of 80000 | Loss --> 1.885 | Grad_l2 --> 0.302 | Weights_l2 --> 9100.187 | Lr --> 0.004 | Seconds_per_step --> 3.381 | [2024-08-11 05:36:40,598][Main][INFO] - [train] Step 43350 out of 80000 | Loss --> 1.884 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.247 | Lr --> 0.004 | Seconds_per_step --> 3.379 | [2024-08-11 05:39:30,749][Main][INFO] - [train] Step 43400 out of 80000 | Loss --> 1.873 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.324 | Lr --> 0.004 | Seconds_per_step --> 3.403 | [2024-08-11 05:42:19,147][Main][INFO] - [train] Step 43450 out of 80000 | Loss --> 1.870 | Grad_l2 --> 0.303 | Weights_l2 --> 9100.419 | Lr --> 0.004 | Seconds_per_step --> 3.368 | [2024-08-11 05:45:08,526][Main][INFO] - [train] Step 43500 out of 80000 | Loss --> 1.873 | Grad_l2 --> 0.303 | Weights_l2 --> 9100.487 | Lr --> 0.004 | Seconds_per_step --> 3.388 | [2024-08-11 05:47:57,721][Main][INFO] - [train] Step 43550 out of 80000 | Loss --> 1.876 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.561 | Lr --> 0.004 | Seconds_per_step --> 3.384 | [2024-08-11 05:50:44,763][Main][INFO] - [train] Step 43600 out of 80000 | Loss --> 1.879 | Grad_l2 --> 0.309 | Weights_l2 --> 9100.639 | Lr --> 0.004 | Seconds_per_step --> 3.341 | [2024-08-11 05:53:32,976][Main][INFO] - [train] Step 43650 out of 80000 | Loss --> 1.876 | Grad_l2 --> 0.303 | Weights_l2 --> 9100.704 | Lr --> 0.004 | Seconds_per_step --> 3.364 | [2024-08-11 05:56:21,776][Main][INFO] - [train] Step 43700 out of 80000 | Loss --> 1.874 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.768 | Lr --> 0.004 | Seconds_per_step --> 3.376 | [2024-08-11 05:59:08,951][Main][INFO] - [train] Step 43750 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9100.837 | Lr --> 0.004 | Seconds_per_step --> 3.343 | [2024-08-11 06:01:57,552][Main][INFO] - [train] Step 43800 out of 80000 | Loss --> 1.872 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.908 | Lr --> 0.004 | Seconds_per_step --> 3.372 | [2024-08-11 06:04:47,762][Main][INFO] - [train] Step 43850 out of 80000 | Loss --> 1.873 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.958 | Lr --> 0.004 | Seconds_per_step --> 3.404 | [2024-08-11 06:07:36,787][Main][INFO] - [train] Step 43900 out of 80000 | Loss --> 1.878 | Grad_l2 --> 0.306 | Weights_l2 --> 9101.039 | Lr --> 0.004 | Seconds_per_step --> 3.380 | [2024-08-11 06:10:25,689][Main][INFO] - [train] Step 43950 out of 80000 | Loss --> 1.871 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.105 | Lr --> 0.004 | Seconds_per_step --> 3.378 | [2024-08-11 06:13:14,016][Main][INFO] - [train] Step 44000 out of 80000 | Loss --> 1.871 | Grad_l2 --> 0.306 | Weights_l2 --> 9101.158 | Lr --> 0.004 | Seconds_per_step --> 3.367 | [2024-08-11 06:16:02,351][Main][INFO] - [train] Step 44050 out of 80000 | Loss --> 1.859 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.227 | Lr --> 0.004 | Seconds_per_step --> 3.367 | [2024-08-11 06:18:51,302][Main][INFO] - [train] Step 44100 out of 80000 | Loss --> 1.864 | Grad_l2 --> 0.301 | Weights_l2 --> 9101.284 | Lr --> 0.004 | Seconds_per_step --> 3.379 | [2024-08-11 06:21:40,055][Main][INFO] - [train] Step 44150 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.335 | Lr --> 0.004 | Seconds_per_step --> 3.375 | [2024-08-11 06:24:28,191][Main][INFO] - [train] Step 44200 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.301 | Weights_l2 --> 9101.392 | Lr --> 0.004 | Seconds_per_step --> 3.363 | [2024-08-11 06:27:17,162][Main][INFO] - [train] Step 44250 out of 80000 | Loss --> 1.861 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.436 | Lr --> 0.004 | Seconds_per_step --> 3.379 | [2024-08-11 06:30:06,913][Main][INFO] - [train] Step 44300 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.494 | Lr --> 0.004 | Seconds_per_step --> 3.395 | [2024-08-11 06:32:55,550][Main][INFO] - [train] Step 44350 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.548 | Lr --> 0.004 | Seconds_per_step --> 3.373 | [2024-08-11 06:35:45,051][Main][INFO] - [train] Step 44400 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.302 | Weights_l2 --> 9101.593 | Lr --> 0.004 | Seconds_per_step --> 3.390 | [2024-08-11 06:38:35,493][Main][INFO] - [train] Step 44450 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.643 | Lr --> 0.004 | Seconds_per_step --> 3.409 | [2024-08-11 06:41:25,254][Main][INFO] - [train] Step 44500 out of 80000 | Loss --> 1.861 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.695 | Lr --> 0.004 | Seconds_per_step --> 3.395 | [2024-08-11 06:44:14,450][Main][INFO] - [train] Step 44550 out of 80000 | Loss --> 1.861 | Grad_l2 --> 0.299 | Weights_l2 --> 9101.731 | Lr --> 0.004 | Seconds_per_step --> 3.384 | [2024-08-11 06:47:02,935][Main][INFO] - [train] Step 44600 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.300 | Weights_l2 --> 9101.771 | Lr --> 0.004 | Seconds_per_step --> 3.370 | [2024-08-11 06:49:52,282][Main][INFO] - [train] Step 44650 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.299 | Weights_l2 --> 9101.813 | Lr --> 0.004 | Seconds_per_step --> 3.387 | [2024-08-11 06:52:40,849][Main][INFO] - [train] Step 44700 out of 80000 | Loss --> 1.854 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.850 | Lr --> 0.004 | Seconds_per_step --> 3.371 | [2024-08-11 06:55:29,381][Main][INFO] - [train] Step 44750 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.887 | Lr --> 0.004 | Seconds_per_step --> 3.371 | [2024-08-11 06:58:18,119][Main][INFO] - [train] Step 44800 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.931 | Lr --> 0.004 | Seconds_per_step --> 3.375 | [2024-08-11 07:01:08,131][Main][INFO] - [train] Step 44850 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.974 | Lr --> 0.004 | Seconds_per_step --> 3.400 | [2024-08-11 07:03:56,588][Main][INFO] - [train] Step 44900 out of 80000 | Loss --> 1.867 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.024 | Lr --> 0.004 | Seconds_per_step --> 3.369 | [2024-08-11 07:06:45,519][Main][INFO] - [train] Step 44950 out of 80000 | Loss --> 1.866 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.067 | Lr --> 0.004 | Seconds_per_step --> 3.379 | [2024-08-11 07:09:34,423][Main][INFO] - [train] Step 45000 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.109 | Lr --> 0.004 | Seconds_per_step --> 3.378 | [2024-08-11 07:09:34,423][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-45000 [2024-08-11 07:09:34,426][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-11 07:09:36,448][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-45000/model.safetensors [2024-08-11 07:09:39,204][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-45000/optimizer.bin [2024-08-11 07:09:39,204][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-45000/scheduler.bin [2024-08-11 07:09:39,204][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-45000/sampler.bin [2024-08-11 07:09:39,204][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-45000/sampler_1.bin [2024-08-11 07:09:39,205][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-45000/random_states_0.pkl [2024-08-11 07:12:28,043][Main][INFO] - [train] Step 45050 out of 80000 | Loss --> 1.864 | Grad_l2 --> 0.306 | Weights_l2 --> 9102.160 | Lr --> 0.004 | Seconds_per_step --> 3.472 | [2024-08-11 07:15:17,961][Main][INFO] - [train] Step 45100 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.199 | Lr --> 0.004 | Seconds_per_step --> 3.398 | [2024-08-11 07:18:07,676][Main][INFO] - [train] Step 45150 out of 80000 | Loss --> 1.872 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.244 | Lr --> 0.004 | Seconds_per_step --> 3.394 | [2024-08-11 07:20:56,273][Main][INFO] - [train] Step 45200 out of 80000 | Loss --> 1.853 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.292 | Lr --> 0.004 | Seconds_per_step --> 3.372 | [2024-08-11 07:23:45,427][Main][INFO] - [train] Step 45250 out of 80000 | Loss --> 1.864 | Grad_l2 --> 0.307 | Weights_l2 --> 9102.328 | Lr --> 0.004 | Seconds_per_step --> 3.383 | [2024-08-11 07:26:35,053][Main][INFO] - [train] Step 45300 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.357 | Lr --> 0.004 | Seconds_per_step --> 3.393 | [2024-08-11 07:29:23,673][Main][INFO] - [train] Step 45350 out of 80000 | Loss --> 1.865 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.392 | Lr --> 0.004 | Seconds_per_step --> 3.372 | [2024-08-11 07:32:13,388][Main][INFO] - [train] Step 45400 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.425 | Lr --> 0.004 | Seconds_per_step --> 3.394 | [2024-08-11 07:35:02,099][Main][INFO] - [train] Step 45450 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.460 | Lr --> 0.004 | Seconds_per_step --> 3.374 | [2024-08-11 07:37:52,499][Main][INFO] - [train] Step 45500 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.483 | Lr --> 0.004 | Seconds_per_step --> 3.408 | [2024-08-11 07:40:42,341][Main][INFO] - [train] Step 45550 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.508 | Lr --> 0.004 | Seconds_per_step --> 3.397 | [2024-08-11 07:43:32,253][Main][INFO] - [train] Step 45600 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.537 | Lr --> 0.004 | Seconds_per_step --> 3.398 | [2024-08-11 07:46:21,818][Main][INFO] - [train] Step 45650 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.566 | Lr --> 0.004 | Seconds_per_step --> 3.391 | [2024-08-11 07:49:11,294][Main][INFO] - [train] Step 45700 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.595 | Lr --> 0.004 | Seconds_per_step --> 3.390 | [2024-08-11 07:51:59,774][Main][INFO] - [train] Step 45750 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.612 | Lr --> 0.004 | Seconds_per_step --> 3.370 | [2024-08-11 07:54:48,582][Main][INFO] - [train] Step 45800 out of 80000 | Loss --> 1.861 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.628 | Lr --> 0.004 | Seconds_per_step --> 3.376 | [2024-08-11 07:57:37,200][Main][INFO] - [train] Step 45850 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.643 | Lr --> 0.004 | Seconds_per_step --> 3.372 | [2024-08-11 08:00:26,936][Main][INFO] - [train] Step 45900 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.666 | Lr --> 0.004 | Seconds_per_step --> 3.395 | [2024-08-11 08:03:16,131][Main][INFO] - [train] Step 45950 out of 80000 | Loss --> 1.859 | Grad_l2 --> 0.305 | Weights_l2 --> 9102.691 | Lr --> 0.004 | Seconds_per_step --> 3.384 | [2024-08-11 08:06:04,985][Main][INFO] - [train] Step 46000 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.708 | Lr --> 0.004 | Seconds_per_step --> 3.377 | [2024-08-11 08:08:53,975][Main][INFO] - [train] Step 46050 out of 80000 | Loss --> 1.847 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.731 | Lr --> 0.004 | Seconds_per_step --> 3.380 | [2024-08-11 08:11:43,887][Main][INFO] - [train] Step 46100 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.751 | Lr --> 0.004 | Seconds_per_step --> 3.398 | [2024-08-11 08:14:32,925][Main][INFO] - [train] Step 46150 out of 80000 | Loss --> 1.854 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.767 | Lr --> 0.004 | Seconds_per_step --> 3.381 | [2024-08-11 08:17:21,782][Main][INFO] - [train] Step 46200 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.776 | Lr --> 0.004 | Seconds_per_step --> 3.377 | [2024-08-11 08:20:11,070][Main][INFO] - [train] Step 46250 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.791 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 08:23:00,985][Main][INFO] - [train] Step 46300 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.809 | Lr --> 0.004 | Seconds_per_step --> 3.398 | [2024-08-11 08:25:48,354][Main][INFO] - [train] Step 46350 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.822 | Lr --> 0.004 | Seconds_per_step --> 3.347 | [2024-08-11 08:28:37,651][Main][INFO] - [train] Step 46400 out of 80000 | Loss --> 1.848 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.831 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 08:31:27,592][Main][INFO] - [train] Step 46450 out of 80000 | Loss --> 1.854 | Grad_l2 --> 0.299 | Weights_l2 --> 9102.837 | Lr --> 0.004 | Seconds_per_step --> 3.399 | [2024-08-11 08:34:17,583][Main][INFO] - [train] Step 46500 out of 80000 | Loss --> 1.841 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.850 | Lr --> 0.004 | Seconds_per_step --> 3.400 | [2024-08-11 08:37:06,886][Main][INFO] - [train] Step 46550 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.858 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 08:39:56,545][Main][INFO] - [train] Step 46600 out of 80000 | Loss --> 1.845 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.855 | Lr --> 0.004 | Seconds_per_step --> 3.393 | [2024-08-11 08:42:45,502][Main][INFO] - [train] Step 46650 out of 80000 | Loss --> 1.845 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.857 | Lr --> 0.004 | Seconds_per_step --> 3.379 | [2024-08-11 08:45:34,652][Main][INFO] - [train] Step 46700 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.866 | Lr --> 0.004 | Seconds_per_step --> 3.383 | [2024-08-11 08:48:24,024][Main][INFO] - [train] Step 46750 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.869 | Lr --> 0.004 | Seconds_per_step --> 3.387 | [2024-08-11 08:51:12,848][Main][INFO] - [train] Step 46800 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.865 | Lr --> 0.004 | Seconds_per_step --> 3.376 | [2024-08-11 08:54:02,438][Main][INFO] - [train] Step 46850 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.867 | Lr --> 0.004 | Seconds_per_step --> 3.392 | [2024-08-11 08:56:51,614][Main][INFO] - [train] Step 46900 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.872 | Lr --> 0.004 | Seconds_per_step --> 3.384 | [2024-08-11 08:59:41,654][Main][INFO] - [train] Step 46950 out of 80000 | Loss --> 1.861 | Grad_l2 --> 0.306 | Weights_l2 --> 9102.882 | Lr --> 0.004 | Seconds_per_step --> 3.401 | [2024-08-11 09:02:30,942][Main][INFO] - [train] Step 47000 out of 80000 | Loss --> 1.866 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.872 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 09:05:19,246][Main][INFO] - [train] Step 47050 out of 80000 | Loss --> 1.868 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.874 | Lr --> 0.004 | Seconds_per_step --> 3.366 | [2024-08-11 09:08:08,518][Main][INFO] - [train] Step 47100 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.877 | Lr --> 0.004 | Seconds_per_step --> 3.385 | [2024-08-11 09:10:58,772][Main][INFO] - [train] Step 47150 out of 80000 | Loss --> 1.870 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.867 | Lr --> 0.004 | Seconds_per_step --> 3.405 | [2024-08-11 09:13:49,754][Main][INFO] - [train] Step 47200 out of 80000 | Loss --> 1.853 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.854 | Lr --> 0.004 | Seconds_per_step --> 3.420 | [2024-08-11 09:16:39,033][Main][INFO] - [train] Step 47250 out of 80000 | Loss --> 1.867 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.846 | Lr --> 0.004 | Seconds_per_step --> 3.386 | [2024-08-11 09:19:28,134][Main][INFO] - [train] Step 47300 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.850 | Lr --> 0.004 | Seconds_per_step --> 3.382 | [2024-08-11 09:22:18,500][Main][INFO] - [train] Step 47350 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.833 | Lr --> 0.004 | Seconds_per_step --> 3.407 | [2024-08-11 09:25:07,928][Main][INFO] - [train] Step 47400 out of 80000 | Loss --> 1.865 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.834 | Lr --> 0.004 | Seconds_per_step --> 3.389 | [2024-08-11 09:27:57,487][Main][INFO] - [train] Step 47450 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.833 | Lr --> 0.004 | Seconds_per_step --> 3.391 | [2024-08-11 09:30:46,725][Main][INFO] - [train] Step 47500 out of 80000 | Loss --> 1.865 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.813 | Lr --> 0.004 | Seconds_per_step --> 3.385 | [2024-08-11 09:33:36,752][Main][INFO] - [train] Step 47550 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.811 | Lr --> 0.004 | Seconds_per_step --> 3.401 | [2024-08-11 09:36:24,674][Main][INFO] - [train] Step 47600 out of 80000 | Loss --> 1.860 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.806 | Lr --> 0.004 | Seconds_per_step --> 3.358 | [2024-08-11 09:39:13,509][Main][INFO] - [train] Step 47650 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.793 | Lr --> 0.004 | Seconds_per_step --> 3.377 | [2024-08-11 09:42:02,601][Main][INFO] - [train] Step 47700 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.780 | Lr --> 0.004 | Seconds_per_step --> 3.382 | [2024-08-11 09:44:50,911][Main][INFO] - [train] Step 47750 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.776 | Lr --> 0.004 | Seconds_per_step --> 3.366 | [2024-08-11 09:47:37,616][Main][INFO] - [train] Step 47800 out of 80000 | Loss --> 1.862 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.762 | Lr --> 0.004 | Seconds_per_step --> 3.334 | [2024-08-11 09:50:26,867][Main][INFO] - [train] Step 47850 out of 80000 | Loss --> 1.864 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.746 | Lr --> 0.003 | Seconds_per_step --> 3.385 | [2024-08-11 09:53:16,815][Main][INFO] - [train] Step 47900 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.723 | Lr --> 0.003 | Seconds_per_step --> 3.399 | [2024-08-11 09:56:06,163][Main][INFO] - [train] Step 47950 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.696 | Lr --> 0.003 | Seconds_per_step --> 3.387 | [2024-08-11 09:58:56,126][Main][INFO] - [train] Step 48000 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.680 | Lr --> 0.003 | Seconds_per_step --> 3.399 | [2024-08-11 10:01:46,482][Main][INFO] - [train] Step 48050 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.658 | Lr --> 0.003 | Seconds_per_step --> 3.407 | [2024-08-11 10:04:36,355][Main][INFO] - [train] Step 48100 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.299 | Weights_l2 --> 9102.643 | Lr --> 0.003 | Seconds_per_step --> 3.397 | [2024-08-11 10:07:26,163][Main][INFO] - [train] Step 48150 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.618 | Lr --> 0.003 | Seconds_per_step --> 3.396 | [2024-08-11 10:10:15,109][Main][INFO] - [train] Step 48200 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.598 | Lr --> 0.003 | Seconds_per_step --> 3.379 | [2024-08-11 10:13:04,283][Main][INFO] - [train] Step 48250 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.579 | Lr --> 0.003 | Seconds_per_step --> 3.383 | [2024-08-11 10:15:56,254][Main][INFO] - [train] Step 48300 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.561 | Lr --> 0.003 | Seconds_per_step --> 3.439 | [2024-08-11 10:18:46,081][Main][INFO] - [train] Step 48350 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.531 | Lr --> 0.003 | Seconds_per_step --> 3.397 | [2024-08-11 10:21:34,933][Main][INFO] - [train] Step 48400 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.506 | Lr --> 0.003 | Seconds_per_step --> 3.377 | [2024-08-11 10:24:21,068][Main][INFO] - [train] Step 48450 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.483 | Lr --> 0.003 | Seconds_per_step --> 3.323 | [2024-08-11 10:27:09,253][Main][INFO] - [train] Step 48500 out of 80000 | Loss --> 1.842 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.456 | Lr --> 0.003 | Seconds_per_step --> 3.364 | [2024-08-11 10:29:57,885][Main][INFO] - [train] Step 48550 out of 80000 | Loss --> 1.845 | Grad_l2 --> 0.303 | Weights_l2 --> 9102.436 | Lr --> 0.003 | Seconds_per_step --> 3.373 | [2024-08-11 10:32:48,240][Main][INFO] - [train] Step 48600 out of 80000 | Loss --> 1.846 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.391 | Lr --> 0.003 | Seconds_per_step --> 3.407 | [2024-08-11 10:35:37,262][Main][INFO] - [train] Step 48650 out of 80000 | Loss --> 1.827 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.374 | Lr --> 0.003 | Seconds_per_step --> 3.380 | [2024-08-11 10:38:26,196][Main][INFO] - [train] Step 48700 out of 80000 | Loss --> 1.840 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.333 | Lr --> 0.003 | Seconds_per_step --> 3.379 | [2024-08-11 10:41:14,907][Main][INFO] - [train] Step 48750 out of 80000 | Loss --> 1.835 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.312 | Lr --> 0.003 | Seconds_per_step --> 3.374 | [2024-08-11 10:44:04,940][Main][INFO] - [train] Step 48800 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.299 | Weights_l2 --> 9102.293 | Lr --> 0.003 | Seconds_per_step --> 3.401 | [2024-08-11 10:46:53,292][Main][INFO] - [train] Step 48850 out of 80000 | Loss --> 1.825 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.255 | Lr --> 0.003 | Seconds_per_step --> 3.367 | [2024-08-11 10:49:42,903][Main][INFO] - [train] Step 48900 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.302 | Weights_l2 --> 9102.212 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 10:52:33,141][Main][INFO] - [train] Step 48950 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.299 | Weights_l2 --> 9102.190 | Lr --> 0.003 | Seconds_per_step --> 3.405 | [2024-08-11 10:55:23,302][Main][INFO] - [train] Step 49000 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.157 | Lr --> 0.003 | Seconds_per_step --> 3.403 | [2024-08-11 10:58:12,503][Main][INFO] - [train] Step 49050 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.124 | Lr --> 0.003 | Seconds_per_step --> 3.384 | [2024-08-11 11:01:02,175][Main][INFO] - [train] Step 49100 out of 80000 | Loss --> 1.828 | Grad_l2 --> 0.300 | Weights_l2 --> 9102.096 | Lr --> 0.003 | Seconds_per_step --> 3.393 | [2024-08-11 11:03:51,767][Main][INFO] - [train] Step 49150 out of 80000 | Loss --> 1.834 | Grad_l2 --> 0.304 | Weights_l2 --> 9102.049 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 11:06:42,443][Main][INFO] - [train] Step 49200 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.301 | Weights_l2 --> 9102.013 | Lr --> 0.003 | Seconds_per_step --> 3.414 | [2024-08-11 11:09:31,684][Main][INFO] - [train] Step 49250 out of 80000 | Loss --> 1.833 | Grad_l2 --> 0.302 | Weights_l2 --> 9101.972 | Lr --> 0.003 | Seconds_per_step --> 3.385 | [2024-08-11 11:12:20,327][Main][INFO] - [train] Step 49300 out of 80000 | Loss --> 1.842 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.928 | Lr --> 0.003 | Seconds_per_step --> 3.373 | [2024-08-11 11:15:09,324][Main][INFO] - [train] Step 49350 out of 80000 | Loss --> 1.846 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.889 | Lr --> 0.003 | Seconds_per_step --> 3.380 | [2024-08-11 11:18:00,030][Main][INFO] - [train] Step 49400 out of 80000 | Loss --> 1.838 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.850 | Lr --> 0.003 | Seconds_per_step --> 3.414 | [2024-08-11 11:20:49,651][Main][INFO] - [train] Step 49450 out of 80000 | Loss --> 1.835 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.804 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 11:23:37,817][Main][INFO] - [train] Step 49500 out of 80000 | Loss --> 1.843 | Grad_l2 --> 0.302 | Weights_l2 --> 9101.766 | Lr --> 0.003 | Seconds_per_step --> 3.363 | [2024-08-11 11:26:26,286][Main][INFO] - [train] Step 49550 out of 80000 | Loss --> 1.842 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.726 | Lr --> 0.003 | Seconds_per_step --> 3.369 | [2024-08-11 11:29:16,394][Main][INFO] - [train] Step 49600 out of 80000 | Loss --> 1.848 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.673 | Lr --> 0.003 | Seconds_per_step --> 3.402 | [2024-08-11 11:32:04,951][Main][INFO] - [train] Step 49650 out of 80000 | Loss --> 1.841 | Grad_l2 --> 0.301 | Weights_l2 --> 9101.629 | Lr --> 0.003 | Seconds_per_step --> 3.371 | [2024-08-11 11:34:54,813][Main][INFO] - [train] Step 49700 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.585 | Lr --> 0.003 | Seconds_per_step --> 3.397 | [2024-08-11 11:37:44,289][Main][INFO] - [train] Step 49750 out of 80000 | Loss --> 1.855 | Grad_l2 --> 0.306 | Weights_l2 --> 9101.539 | Lr --> 0.003 | Seconds_per_step --> 3.390 | [2024-08-11 11:40:33,889][Main][INFO] - [train] Step 49800 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.493 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 11:43:23,944][Main][INFO] - [train] Step 49850 out of 80000 | Loss --> 1.841 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.449 | Lr --> 0.003 | Seconds_per_step --> 3.401 | [2024-08-11 11:46:13,525][Main][INFO] - [train] Step 49900 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.408 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 11:49:03,443][Main][INFO] - [train] Step 49950 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.359 | Lr --> 0.003 | Seconds_per_step --> 3.398 | [2024-08-11 11:51:53,359][Main][INFO] - [train] Step 50000 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.306 | Weights_l2 --> 9101.317 | Lr --> 0.003 | Seconds_per_step --> 3.398 | [2024-08-11 11:51:53,359][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-50000 [2024-08-11 11:51:53,362][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-11 11:51:55,493][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-50000/model.safetensors [2024-08-11 11:51:58,610][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-50000/optimizer.bin [2024-08-11 11:51:58,610][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-50000/scheduler.bin [2024-08-11 11:51:58,610][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-50000/sampler.bin [2024-08-11 11:51:58,610][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-50000/sampler_1.bin [2024-08-11 11:51:58,611][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-50000/random_states_0.pkl [2024-08-11 11:54:48,709][Main][INFO] - [train] Step 50050 out of 80000 | Loss --> 1.848 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.280 | Lr --> 0.003 | Seconds_per_step --> 3.507 | [2024-08-11 11:57:38,118][Main][INFO] - [train] Step 50100 out of 80000 | Loss --> 1.853 | Grad_l2 --> 0.305 | Weights_l2 --> 9101.232 | Lr --> 0.003 | Seconds_per_step --> 3.388 | [2024-08-11 12:00:27,137][Main][INFO] - [train] Step 50150 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.305 | Weights_l2 --> 9101.189 | Lr --> 0.003 | Seconds_per_step --> 3.380 | [2024-08-11 12:03:16,714][Main][INFO] - [train] Step 50200 out of 80000 | Loss --> 1.853 | Grad_l2 --> 0.304 | Weights_l2 --> 9101.138 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 12:06:06,587][Main][INFO] - [train] Step 50250 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.303 | Weights_l2 --> 9101.077 | Lr --> 0.003 | Seconds_per_step --> 3.397 | [2024-08-11 12:08:54,840][Main][INFO] - [train] Step 50300 out of 80000 | Loss --> 1.859 | Grad_l2 --> 0.308 | Weights_l2 --> 9101.026 | Lr --> 0.003 | Seconds_per_step --> 3.365 | [2024-08-11 12:11:43,859][Main][INFO] - [train] Step 50350 out of 80000 | Loss --> 1.867 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.972 | Lr --> 0.003 | Seconds_per_step --> 3.380 | [2024-08-11 12:14:31,964][Main][INFO] - [train] Step 50400 out of 80000 | Loss --> 1.858 | Grad_l2 --> 0.307 | Weights_l2 --> 9100.919 | Lr --> 0.003 | Seconds_per_step --> 3.362 | [2024-08-11 12:17:20,154][Main][INFO] - [train] Step 50450 out of 80000 | Loss --> 1.865 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.876 | Lr --> 0.003 | Seconds_per_step --> 3.364 | [2024-08-11 12:20:08,016][Main][INFO] - [train] Step 50500 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.820 | Lr --> 0.003 | Seconds_per_step --> 3.357 | [2024-08-11 12:22:56,654][Main][INFO] - [train] Step 50550 out of 80000 | Loss --> 1.859 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.766 | Lr --> 0.003 | Seconds_per_step --> 3.373 | [2024-08-11 12:25:46,183][Main][INFO] - [train] Step 50600 out of 80000 | Loss --> 1.859 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.712 | Lr --> 0.003 | Seconds_per_step --> 3.391 | [2024-08-11 12:28:36,862][Main][INFO] - [train] Step 50650 out of 80000 | Loss --> 1.866 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.660 | Lr --> 0.003 | Seconds_per_step --> 3.414 | [2024-08-11 12:31:26,475][Main][INFO] - [train] Step 50700 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.302 | Weights_l2 --> 9100.605 | Lr --> 0.003 | Seconds_per_step --> 3.392 | [2024-08-11 12:34:15,202][Main][INFO] - [train] Step 50750 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.308 | Weights_l2 --> 9100.546 | Lr --> 0.003 | Seconds_per_step --> 3.375 | [2024-08-11 12:37:04,603][Main][INFO] - [train] Step 50800 out of 80000 | Loss --> 1.857 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.481 | Lr --> 0.003 | Seconds_per_step --> 3.388 | [2024-08-11 12:39:53,849][Main][INFO] - [train] Step 50850 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.431 | Lr --> 0.003 | Seconds_per_step --> 3.385 | [2024-08-11 12:42:42,323][Main][INFO] - [train] Step 50900 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.369 | Lr --> 0.003 | Seconds_per_step --> 3.369 | [2024-08-11 12:45:30,712][Main][INFO] - [train] Step 50950 out of 80000 | Loss --> 1.863 | Grad_l2 --> 0.304 | Weights_l2 --> 9100.297 | Lr --> 0.003 | Seconds_per_step --> 3.368 | [2024-08-11 12:48:19,531][Main][INFO] - [train] Step 51000 out of 80000 | Loss --> 1.856 | Grad_l2 --> 0.305 | Weights_l2 --> 9100.235 | Lr --> 0.003 | Seconds_per_step --> 3.376 | [2024-08-11 12:51:12,887][Main][INFO] - [train] Step 51050 out of 80000 | Loss --> 1.851 | Grad_l2 --> 0.302 | Weights_l2 --> 9100.189 | Lr --> 0.003 | Seconds_per_step --> 3.467 | [2024-08-11 12:54:01,929][Main][INFO] - [train] Step 51100 out of 80000 | Loss --> 1.847 | Grad_l2 --> 0.306 | Weights_l2 --> 9100.124 | Lr --> 0.003 | Seconds_per_step --> 3.381 | [2024-08-11 12:56:51,344][Main][INFO] - [train] Step 51150 out of 80000 | Loss --> 1.841 | Grad_l2 --> 0.305 | Weights_l2 --> 9100.062 | Lr --> 0.003 | Seconds_per_step --> 3.388 | [2024-08-11 12:59:41,233][Main][INFO] - [train] Step 51200 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.305 | Weights_l2 --> 9099.993 | Lr --> 0.003 | Seconds_per_step --> 3.398 | [2024-08-11 13:02:30,398][Main][INFO] - [train] Step 51250 out of 80000 | Loss --> 1.850 | Grad_l2 --> 0.302 | Weights_l2 --> 9099.923 | Lr --> 0.003 | Seconds_per_step --> 3.383 | [2024-08-11 13:05:18,484][Main][INFO] - [train] Step 51300 out of 80000 | Loss --> 1.852 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.853 | Lr --> 0.003 | Seconds_per_step --> 3.362 | [2024-08-11 13:08:06,024][Main][INFO] - [train] Step 51350 out of 80000 | Loss --> 1.849 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.783 | Lr --> 0.003 | Seconds_per_step --> 3.351 | [2024-08-11 13:10:53,915][Main][INFO] - [train] Step 51400 out of 80000 | Loss --> 1.839 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.716 | Lr --> 0.003 | Seconds_per_step --> 3.358 | [2024-08-11 13:13:41,963][Main][INFO] - [train] Step 51450 out of 80000 | Loss --> 1.845 | Grad_l2 --> 0.306 | Weights_l2 --> 9099.653 | Lr --> 0.003 | Seconds_per_step --> 3.361 | [2024-08-11 13:16:31,982][Main][INFO] - [train] Step 51500 out of 80000 | Loss --> 1.839 | Grad_l2 --> 0.302 | Weights_l2 --> 9099.578 | Lr --> 0.003 | Seconds_per_step --> 3.400 | [2024-08-11 13:19:21,318][Main][INFO] - [train] Step 51550 out of 80000 | Loss --> 1.837 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.520 | Lr --> 0.003 | Seconds_per_step --> 3.387 | [2024-08-11 13:22:09,644][Main][INFO] - [train] Step 51600 out of 80000 | Loss --> 1.833 | Grad_l2 --> 0.305 | Weights_l2 --> 9099.446 | Lr --> 0.003 | Seconds_per_step --> 3.367 | [2024-08-11 13:25:06,187][Main][INFO] - [train] Step 51650 out of 80000 | Loss --> 1.842 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.375 | Lr --> 0.003 | Seconds_per_step --> 3.531 | [2024-08-11 13:27:55,296][Main][INFO] - [train] Step 51700 out of 80000 | Loss --> 1.824 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.306 | Lr --> 0.003 | Seconds_per_step --> 3.382 | [2024-08-11 13:30:48,050][Main][INFO] - [train] Step 51750 out of 80000 | Loss --> 1.834 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.235 | Lr --> 0.003 | Seconds_per_step --> 3.455 | [2024-08-11 13:33:36,637][Main][INFO] - [train] Step 51800 out of 80000 | Loss --> 1.829 | Grad_l2 --> 0.303 | Weights_l2 --> 9099.164 | Lr --> 0.003 | Seconds_per_step --> 3.372 | [2024-08-11 13:36:25,148][Main][INFO] - [train] Step 51850 out of 80000 | Loss --> 1.831 | Grad_l2 --> 0.306 | Weights_l2 --> 9099.098 | Lr --> 0.003 | Seconds_per_step --> 3.370 | [2024-08-11 13:39:14,286][Main][INFO] - [train] Step 51900 out of 80000 | Loss --> 1.828 | Grad_l2 --> 0.304 | Weights_l2 --> 9099.024 | Lr --> 0.003 | Seconds_per_step --> 3.383 | [2024-08-11 13:42:02,662][Main][INFO] - [train] Step 51950 out of 80000 | Loss --> 1.828 | Grad_l2 --> 0.305 | Weights_l2 --> 9098.956 | Lr --> 0.003 | Seconds_per_step --> 3.368 | [2024-08-11 13:44:51,092][Main][INFO] - [train] Step 52000 out of 80000 | Loss --> 1.826 | Grad_l2 --> 0.305 | Weights_l2 --> 9098.885 | Lr --> 0.003 | Seconds_per_step --> 3.369 | [2024-08-11 13:47:40,032][Main][INFO] - [train] Step 52050 out of 80000 | Loss --> 1.822 | Grad_l2 --> 0.302 | Weights_l2 --> 9098.823 | Lr --> 0.003 | Seconds_per_step --> 3.379 | [2024-08-11 13:50:29,006][Main][INFO] - [train] Step 52100 out of 80000 | Loss --> 1.825 | Grad_l2 --> 0.305 | Weights_l2 --> 9098.748 | Lr --> 0.003 | Seconds_per_step --> 3.379 | [2024-08-11 13:53:17,565][Main][INFO] - [train] Step 52150 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.302 | Weights_l2 --> 9098.682 | Lr --> 0.003 | Seconds_per_step --> 3.371 | [2024-08-11 13:56:06,216][Main][INFO] - [train] Step 52200 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.303 | Weights_l2 --> 9098.606 | Lr --> 0.003 | Seconds_per_step --> 3.373 | [2024-08-11 13:58:54,102][Main][INFO] - [train] Step 52250 out of 80000 | Loss --> 1.816 | Grad_l2 --> 0.303 | Weights_l2 --> 9098.532 | Lr --> 0.003 | Seconds_per_step --> 3.358 | [2024-08-11 14:01:43,559][Main][INFO] - [train] Step 52300 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.302 | Weights_l2 --> 9098.464 | Lr --> 0.003 | Seconds_per_step --> 3.389 | [2024-08-11 14:04:32,277][Main][INFO] - [train] Step 52350 out of 80000 | Loss --> 1.814 | Grad_l2 --> 0.302 | Weights_l2 --> 9098.392 | Lr --> 0.003 | Seconds_per_step --> 3.374 | [2024-08-11 14:07:20,639][Main][INFO] - [train] Step 52400 out of 80000 | Loss --> 1.815 | Grad_l2 --> 0.306 | Weights_l2 --> 9098.315 | Lr --> 0.003 | Seconds_per_step --> 3.367 | [2024-08-11 14:10:08,465][Main][INFO] - [train] Step 52450 out of 80000 | Loss --> 1.813 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.238 | Lr --> 0.003 | Seconds_per_step --> 3.357 | [2024-08-11 14:12:57,599][Main][INFO] - [train] Step 52500 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.304 | Weights_l2 --> 9098.172 | Lr --> 0.003 | Seconds_per_step --> 3.383 | [2024-08-11 14:15:45,660][Main][INFO] - [train] Step 52550 out of 80000 | Loss --> 1.813 | Grad_l2 --> 0.306 | Weights_l2 --> 9098.102 | Lr --> 0.003 | Seconds_per_step --> 3.361 | [2024-08-11 14:18:34,118][Main][INFO] - [train] Step 52600 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.305 | Weights_l2 --> 9098.032 | Lr --> 0.003 | Seconds_per_step --> 3.369 | [2024-08-11 14:21:22,137][Main][INFO] - [train] Step 52650 out of 80000 | Loss --> 1.810 | Grad_l2 --> 0.306 | Weights_l2 --> 9097.958 | Lr --> 0.003 | Seconds_per_step --> 3.360 | [2024-08-11 14:24:11,153][Main][INFO] - [train] Step 52700 out of 80000 | Loss --> 1.819 | Grad_l2 --> 0.306 | Weights_l2 --> 9097.883 | Lr --> 0.003 | Seconds_per_step --> 3.380 | [2024-08-11 14:26:59,325][Main][INFO] - [train] Step 52750 out of 80000 | Loss --> 1.819 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.801 | Lr --> 0.003 | Seconds_per_step --> 3.363 | [2024-08-11 14:29:48,152][Main][INFO] - [train] Step 52800 out of 80000 | Loss --> 1.818 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.728 | Lr --> 0.003 | Seconds_per_step --> 3.377 | [2024-08-11 14:32:36,719][Main][INFO] - [train] Step 52850 out of 80000 | Loss --> 1.816 | Grad_l2 --> 0.302 | Weights_l2 --> 9097.651 | Lr --> 0.003 | Seconds_per_step --> 3.371 | [2024-08-11 14:35:26,695][Main][INFO] - [train] Step 52900 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.564 | Lr --> 0.003 | Seconds_per_step --> 3.400 | [2024-08-11 14:38:15,366][Main][INFO] - [train] Step 52950 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.306 | Weights_l2 --> 9097.480 | Lr --> 0.003 | Seconds_per_step --> 3.373 | [2024-08-11 14:41:03,151][Main][INFO] - [train] Step 53000 out of 80000 | Loss --> 1.816 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.407 | Lr --> 0.003 | Seconds_per_step --> 3.356 | [2024-08-11 14:43:51,226][Main][INFO] - [train] Step 53050 out of 80000 | Loss --> 1.822 | Grad_l2 --> 0.303 | Weights_l2 --> 9097.331 | Lr --> 0.003 | Seconds_per_step --> 3.361 | [2024-08-11 14:46:39,664][Main][INFO] - [train] Step 53100 out of 80000 | Loss --> 1.814 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.250 | Lr --> 0.003 | Seconds_per_step --> 3.369 | [2024-08-11 14:49:28,881][Main][INFO] - [train] Step 53150 out of 80000 | Loss --> 1.827 | Grad_l2 --> 0.304 | Weights_l2 --> 9097.175 | Lr --> 0.003 | Seconds_per_step --> 3.384 | [2024-08-11 14:52:17,120][Main][INFO] - [train] Step 53200 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.092 | Lr --> 0.003 | Seconds_per_step --> 3.365 | [2024-08-11 14:55:05,338][Main][INFO] - [train] Step 53250 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.305 | Weights_l2 --> 9097.015 | Lr --> 0.003 | Seconds_per_step --> 3.364 | [2024-08-11 14:57:54,070][Main][INFO] - [train] Step 53300 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.304 | Weights_l2 --> 9096.937 | Lr --> 0.003 | Seconds_per_step --> 3.375 | [2024-08-11 15:00:43,188][Main][INFO] - [train] Step 53350 out of 80000 | Loss --> 1.826 | Grad_l2 --> 0.306 | Weights_l2 --> 9096.856 | Lr --> 0.003 | Seconds_per_step --> 3.382 | [2024-08-11 15:03:31,527][Main][INFO] - [train] Step 53400 out of 80000 | Loss --> 1.824 | Grad_l2 --> 0.304 | Weights_l2 --> 9096.776 | Lr --> 0.003 | Seconds_per_step --> 3.367 | [2024-08-11 15:06:19,876][Main][INFO] - [train] Step 53450 out of 80000 | Loss --> 1.834 | Grad_l2 --> 0.306 | Weights_l2 --> 9096.693 | Lr --> 0.003 | Seconds_per_step --> 3.367 | [2024-08-11 15:09:08,569][Main][INFO] - [train] Step 53500 out of 80000 | Loss --> 1.819 | Grad_l2 --> 0.305 | Weights_l2 --> 9096.607 | Lr --> 0.003 | Seconds_per_step --> 3.374 | [2024-08-11 15:11:58,235][Main][INFO] - [train] Step 53550 out of 80000 | Loss --> 1.827 | Grad_l2 --> 0.308 | Weights_l2 --> 9096.526 | Lr --> 0.003 | Seconds_per_step --> 3.393 | [2024-08-11 15:15:16,128][Main][INFO] - [train] Step 53600 out of 80000 | Loss --> 1.830 | Grad_l2 --> 0.303 | Weights_l2 --> 9096.440 | Lr --> 0.003 | Seconds_per_step --> 3.958 | [2024-08-11 15:19:21,909][Main][INFO] - [train] Step 53650 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.307 | Weights_l2 --> 9096.354 | Lr --> 0.002 | Seconds_per_step --> 4.916 | [2024-08-11 15:23:19,247][Main][INFO] - [train] Step 53700 out of 80000 | Loss --> 1.822 | Grad_l2 --> 0.305 | Weights_l2 --> 9096.265 | Lr --> 0.002 | Seconds_per_step --> 4.747 | [2024-08-11 15:27:16,170][Main][INFO] - [train] Step 53750 out of 80000 | Loss --> 1.822 | Grad_l2 --> 0.307 | Weights_l2 --> 9096.176 | Lr --> 0.002 | Seconds_per_step --> 4.738 | [2024-08-11 15:31:18,960][Main][INFO] - [train] Step 53800 out of 80000 | Loss --> 1.819 | Grad_l2 --> 0.304 | Weights_l2 --> 9096.085 | Lr --> 0.002 | Seconds_per_step --> 4.856 | [2024-08-11 15:35:29,615][Main][INFO] - [train] Step 53850 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.991 | Lr --> 0.002 | Seconds_per_step --> 5.013 | [2024-08-11 15:39:24,381][Main][INFO] - [train] Step 53900 out of 80000 | Loss --> 1.824 | Grad_l2 --> 0.307 | Weights_l2 --> 9095.907 | Lr --> 0.002 | Seconds_per_step --> 4.695 | [2024-08-11 15:43:26,740][Main][INFO] - [train] Step 53950 out of 80000 | Loss --> 1.828 | Grad_l2 --> 0.309 | Weights_l2 --> 9095.828 | Lr --> 0.002 | Seconds_per_step --> 4.847 | [2024-08-11 15:47:32,997][Main][INFO] - [train] Step 54000 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.743 | Lr --> 0.002 | Seconds_per_step --> 4.925 | [2024-08-11 15:51:31,612][Main][INFO] - [train] Step 54050 out of 80000 | Loss --> 1.827 | Grad_l2 --> 0.305 | Weights_l2 --> 9095.659 | Lr --> 0.002 | Seconds_per_step --> 4.772 | [2024-08-11 15:55:28,280][Main][INFO] - [train] Step 54100 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.573 | Lr --> 0.002 | Seconds_per_step --> 4.733 | [2024-08-11 15:59:31,710][Main][INFO] - [train] Step 54150 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.307 | Weights_l2 --> 9095.484 | Lr --> 0.002 | Seconds_per_step --> 4.869 | [2024-08-11 16:03:47,362][Main][INFO] - [train] Step 54200 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.309 | Weights_l2 --> 9095.393 | Lr --> 0.002 | Seconds_per_step --> 5.113 | [2024-08-11 16:07:43,746][Main][INFO] - [train] Step 54250 out of 80000 | Loss --> 1.823 | Grad_l2 --> 0.304 | Weights_l2 --> 9095.311 | Lr --> 0.002 | Seconds_per_step --> 4.728 | [2024-08-11 16:11:42,782][Main][INFO] - [train] Step 54300 out of 80000 | Loss --> 1.825 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.229 | Lr --> 0.002 | Seconds_per_step --> 4.781 | [2024-08-11 16:15:51,539][Main][INFO] - [train] Step 54350 out of 80000 | Loss --> 1.815 | Grad_l2 --> 0.305 | Weights_l2 --> 9095.141 | Lr --> 0.002 | Seconds_per_step --> 4.975 | [2024-08-11 16:19:57,704][Main][INFO] - [train] Step 54400 out of 80000 | Loss --> 1.825 | Grad_l2 --> 0.306 | Weights_l2 --> 9095.050 | Lr --> 0.002 | Seconds_per_step --> 4.923 | [2024-08-11 16:23:55,708][Main][INFO] - [train] Step 54450 out of 80000 | Loss --> 1.818 | Grad_l2 --> 0.307 | Weights_l2 --> 9094.969 | Lr --> 0.002 | Seconds_per_step --> 4.760 | [2024-08-11 16:27:58,061][Main][INFO] - [train] Step 54500 out of 80000 | Loss --> 1.821 | Grad_l2 --> 0.305 | Weights_l2 --> 9094.882 | Lr --> 0.002 | Seconds_per_step --> 4.847 | [2024-08-11 16:32:13,877][Main][INFO] - [train] Step 54550 out of 80000 | Loss --> 1.822 | Grad_l2 --> 0.304 | Weights_l2 --> 9094.795 | Lr --> 0.002 | Seconds_per_step --> 5.116 | [2024-08-11 16:36:14,196][Main][INFO] - [train] Step 54600 out of 80000 | Loss --> 1.814 | Grad_l2 --> 0.307 | Weights_l2 --> 9094.709 | Lr --> 0.002 | Seconds_per_step --> 4.806 | [2024-08-11 16:40:14,393][Main][INFO] - [train] Step 54650 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.304 | Weights_l2 --> 9094.615 | Lr --> 0.002 | Seconds_per_step --> 4.804 | [2024-08-11 16:44:22,066][Main][INFO] - [train] Step 54700 out of 80000 | Loss --> 1.814 | Grad_l2 --> 0.307 | Weights_l2 --> 9094.525 | Lr --> 0.002 | Seconds_per_step --> 4.953 | [2024-08-11 16:48:21,020][Main][INFO] - [train] Step 54750 out of 80000 | Loss --> 1.820 | Grad_l2 --> 0.308 | Weights_l2 --> 9094.438 | Lr --> 0.002 | Seconds_per_step --> 4.779 | [2024-08-11 16:52:25,503][Main][INFO] - [train] Step 54800 out of 80000 | Loss --> 1.817 | Grad_l2 --> 0.306 | Weights_l2 --> 9094.352 | Lr --> 0.002 | Seconds_per_step --> 4.890 | [2024-08-11 16:56:33,065][Main][INFO] - [train] Step 54850 out of 80000 | Loss --> 1.812 | Grad_l2 --> 0.306 | Weights_l2 --> 9094.259 | Lr --> 0.002 | Seconds_per_step --> 4.951 | [2024-08-11 17:00:52,228][Main][INFO] - [train] Step 54900 out of 80000 | Loss --> 1.811 | Grad_l2 --> 0.305 | Weights_l2 --> 9094.169 | Lr --> 0.002 | Seconds_per_step --> 5.183 | [2024-08-11 17:04:52,733][Main][INFO] - [train] Step 54950 out of 80000 | Loss --> 1.801 | Grad_l2 --> 0.307 | Weights_l2 --> 9094.079 | Lr --> 0.002 | Seconds_per_step --> 4.810 | [2024-08-11 17:08:57,633][Main][INFO] - [train] Step 55000 out of 80000 | Loss --> 1.818 | Grad_l2 --> 0.308 | Weights_l2 --> 9093.989 | Lr --> 0.002 | Seconds_per_step --> 4.898 | [2024-08-11 17:08:57,633][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-55000 [2024-08-11 17:08:57,637][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-11 17:09:01,182][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-55000/model.safetensors [2024-08-11 17:09:08,530][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-55000/optimizer.bin [2024-08-11 17:09:08,531][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-55000/scheduler.bin [2024-08-11 17:09:08,531][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-55000/sampler.bin [2024-08-11 17:09:08,531][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-55000/sampler_1.bin [2024-08-11 17:09:08,532][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-55000/random_states_0.pkl [2024-08-11 17:13:21,437][Main][INFO] - [train] Step 55050 out of 80000 | Loss --> 1.807 | Grad_l2 --> 0.304 | Weights_l2 --> 9093.890 | Lr --> 0.002 | Seconds_per_step --> 5.276 | [2024-08-11 17:17:31,393][Main][INFO] - [train] Step 55100 out of 80000 | Loss --> 1.806 | Grad_l2 --> 0.307 | Weights_l2 --> 9093.799 | Lr --> 0.002 | Seconds_per_step --> 4.999 | [2024-08-11 17:21:35,081][Main][INFO] - [train] Step 55150 out of 80000 | Loss --> 1.809 | Grad_l2 --> 0.306 | Weights_l2 --> 9093.712 | Lr --> 0.002 | Seconds_per_step --> 4.874 | [2024-08-11 17:25:45,792][Main][INFO] - [train] Step 55200 out of 80000 | Loss --> 1.801 | Grad_l2 --> 0.305 | Weights_l2 --> 9093.624 | Lr --> 0.002 | Seconds_per_step --> 5.014 | [2024-08-11 17:30:02,450][Main][INFO] - [train] Step 55250 out of 80000 | Loss --> 1.796 | Grad_l2 --> 0.306 | Weights_l2 --> 9093.529 | Lr --> 0.002 | Seconds_per_step --> 5.133 | [2024-08-11 17:34:08,579][Main][INFO] - [train] Step 55300 out of 80000 | Loss --> 1.802 | Grad_l2 --> 0.307 | Weights_l2 --> 9093.433 | Lr --> 0.002 | Seconds_per_step --> 4.922 | [2024-08-11 17:38:13,235][Main][INFO] - [train] Step 55350 out of 80000 | Loss --> 1.796 | Grad_l2 --> 0.306 | Weights_l2 --> 9093.344 | Lr --> 0.002 | Seconds_per_step --> 4.893 | [2024-08-11 17:42:27,852][Main][INFO] - [train] Step 55400 out of 80000 | Loss --> 1.800 | Grad_l2 --> 0.305 | Weights_l2 --> 9093.242 | Lr --> 0.002 | Seconds_per_step --> 5.092 | [2024-08-11 17:46:22,982][Main][INFO] - [train] Step 55450 out of 80000 | Loss --> 1.802 | Grad_l2 --> 0.304 | Weights_l2 --> 9093.152 | Lr --> 0.002 | Seconds_per_step --> 4.703 | [2024-08-11 17:50:20,417][Main][INFO] - [train] Step 55500 out of 80000 | Loss --> 1.802 | Grad_l2 --> 0.306 | Weights_l2 --> 9093.070 | Lr --> 0.002 | Seconds_per_step --> 4.749 | [2024-08-11 17:54:30,178][Main][INFO] - [train] Step 55550 out of 80000 | Loss --> 1.811 | Grad_l2 --> 0.307 | Weights_l2 --> 9092.986 | Lr --> 0.002 | Seconds_per_step --> 4.995 | [2024-08-11 17:58:42,001][Main][INFO] - [train] Step 55600 out of 80000 | Loss --> 1.804 | Grad_l2 --> 0.306 | Weights_l2 --> 9092.885 | Lr --> 0.002 | Seconds_per_step --> 5.036 | [2024-08-11 18:02:39,257][Main][INFO] - [train] Step 55650 out of 80000 | Loss --> 1.804 | Grad_l2 --> 0.311 | Weights_l2 --> 9092.792 | Lr --> 0.002 | Seconds_per_step --> 4.745 | [2024-08-11 18:06:36,810][Main][INFO] - [train] Step 55700 out of 80000 | Loss --> 1.797 | Grad_l2 --> 0.306 | Weights_l2 --> 9092.687 | Lr --> 0.002 | Seconds_per_step --> 4.751 | [2024-08-11 18:10:48,385][Main][INFO] - [train] Step 55750 out of 80000 | Loss --> 1.805 | Grad_l2 --> 0.308 | Weights_l2 --> 9092.598 | Lr --> 0.002 | Seconds_per_step --> 5.031 | [2024-08-11 18:14:53,396][Main][INFO] - [train] Step 55800 out of 80000 | Loss --> 1.813 | Grad_l2 --> 0.308 | Weights_l2 --> 9092.501 | Lr --> 0.002 | Seconds_per_step --> 4.900 | [2024-08-11 18:18:51,650][Main][INFO] - [train] Step 55850 out of 80000 | Loss --> 1.812 | Grad_l2 --> 0.308 | Weights_l2 --> 9092.395 | Lr --> 0.002 | Seconds_per_step --> 4.765 | [2024-08-11 18:23:03,278][Main][INFO] - [train] Step 55900 out of 80000 | Loss --> 1.810 | Grad_l2 --> 0.306 | Weights_l2 --> 9092.299 | Lr --> 0.002 | Seconds_per_step --> 5.033 | [2024-08-11 18:27:14,679][Main][INFO] - [train] Step 55950 out of 80000 | Loss --> 1.811 | Grad_l2 --> 0.309 | Weights_l2 --> 9092.202 | Lr --> 0.002 | Seconds_per_step --> 5.028 | [2024-08-11 18:31:16,692][Main][INFO] - [train] Step 56000 out of 80000 | Loss --> 1.816 | Grad_l2 --> 0.306 | Weights_l2 --> 9092.110 | Lr --> 0.002 | Seconds_per_step --> 4.840 | [2024-08-11 18:35:23,672][Main][INFO] - [train] Step 56050 out of 80000 | Loss --> 1.811 | Grad_l2 --> 0.309 | Weights_l2 --> 9092.013 | Lr --> 0.002 | Seconds_per_step --> 4.940 | [2024-08-11 18:39:34,197][Main][INFO] - [train] Step 56100 out of 80000 | Loss --> 1.803 | Grad_l2 --> 0.308 | Weights_l2 --> 9091.909 | Lr --> 0.002 | Seconds_per_step --> 5.010 | [2024-08-11 18:43:29,112][Main][INFO] - [train] Step 56150 out of 80000 | Loss --> 1.811 | Grad_l2 --> 0.306 | Weights_l2 --> 9091.817 | Lr --> 0.002 | Seconds_per_step --> 4.698 | [2024-08-11 18:47:29,146][Main][INFO] - [train] Step 56200 out of 80000 | Loss --> 1.809 | Grad_l2 --> 0.307 | Weights_l2 --> 9091.721 | Lr --> 0.002 | Seconds_per_step --> 4.801 | [2024-08-11 18:51:37,058][Main][INFO] - [train] Step 56250 out of 80000 | Loss --> 1.810 | Grad_l2 --> 0.306 | Weights_l2 --> 9091.613 | Lr --> 0.002 | Seconds_per_step --> 4.958 | [2024-08-11 18:55:48,367][Main][INFO] - [train] Step 56300 out of 80000 | Loss --> 1.813 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.518 | Lr --> 0.002 | Seconds_per_step --> 5.026 | [2024-08-11 18:59:48,900][Main][INFO] - [train] Step 56350 out of 80000 | Loss --> 1.813 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.421 | Lr --> 0.002 | Seconds_per_step --> 4.811 | [2024-08-11 19:03:49,099][Main][INFO] - [train] Step 56400 out of 80000 | Loss --> 1.803 | Grad_l2 --> 0.310 | Weights_l2 --> 9091.329 | Lr --> 0.002 | Seconds_per_step --> 4.804 | [2024-08-11 19:08:07,847][Main][INFO] - [train] Step 56450 out of 80000 | Loss --> 1.806 | Grad_l2 --> 0.309 | Weights_l2 --> 9091.234 | Lr --> 0.002 | Seconds_per_step --> 5.175 | [2024-08-11 19:12:12,785][Main][INFO] - [train] Step 56500 out of 80000 | Loss --> 1.804 | Grad_l2 --> 0.310 | Weights_l2 --> 9091.130 | Lr --> 0.002 | Seconds_per_step --> 4.899 | [2024-08-11 19:16:07,111][Main][INFO] - [train] Step 56550 out of 80000 | Loss --> 1.809 | Grad_l2 --> 0.307 | Weights_l2 --> 9091.031 | Lr --> 0.002 | Seconds_per_step --> 4.687 | [2024-08-11 19:20:17,900][Main][INFO] - [train] Step 56600 out of 80000 | Loss --> 1.807 | Grad_l2 --> 0.306 | Weights_l2 --> 9090.943 | Lr --> 0.002 | Seconds_per_step --> 5.016 | [2024-08-11 19:24:29,336][Main][INFO] - [train] Step 56650 out of 80000 | Loss --> 1.816 | Grad_l2 --> 0.307 | Weights_l2 --> 9090.840 | Lr --> 0.002 | Seconds_per_step --> 5.029 | [2024-08-11 19:28:33,570][Main][INFO] - [train] Step 56700 out of 80000 | Loss --> 1.804 | Grad_l2 --> 0.311 | Weights_l2 --> 9090.737 | Lr --> 0.002 | Seconds_per_step --> 4.885 | [2024-08-11 19:32:34,870][Main][INFO] - [train] Step 56750 out of 80000 | Loss --> 1.807 | Grad_l2 --> 0.308 | Weights_l2 --> 9090.642 | Lr --> 0.002 | Seconds_per_step --> 4.826 | [2024-08-11 19:36:48,798][Main][INFO] - [train] Step 56800 out of 80000 | Loss --> 1.806 | Grad_l2 --> 0.307 | Weights_l2 --> 9090.549 | Lr --> 0.002 | Seconds_per_step --> 5.079 | [2024-08-11 19:40:53,609][Main][INFO] - [train] Step 56850 out of 80000 | Loss --> 1.799 | Grad_l2 --> 0.308 | Weights_l2 --> 9090.450 | Lr --> 0.002 | Seconds_per_step --> 4.896 | [2024-08-11 19:44:48,784][Main][INFO] - [train] Step 56900 out of 80000 | Loss --> 1.803 | Grad_l2 --> 0.309 | Weights_l2 --> 9090.349 | Lr --> 0.002 | Seconds_per_step --> 4.703 | [2024-08-11 19:48:55,965][Main][INFO] - [train] Step 56950 out of 80000 | Loss --> 1.799 | Grad_l2 --> 0.307 | Weights_l2 --> 9090.256 | Lr --> 0.002 | Seconds_per_step --> 4.944 | [2024-08-11 19:53:02,054][Main][INFO] - [train] Step 57000 out of 80000 | Loss --> 1.797 | Grad_l2 --> 0.308 | Weights_l2 --> 9090.160 | Lr --> 0.002 | Seconds_per_step --> 4.922 | [2024-08-11 19:56:59,854][Main][INFO] - [train] Step 57050 out of 80000 | Loss --> 1.795 | Grad_l2 --> 0.308 | Weights_l2 --> 9090.065 | Lr --> 0.002 | Seconds_per_step --> 4.756 | [2024-08-11 19:57:13,264][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 425286f4-04eb-4af4-9171-eff7b1e97f3d)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/c074f3d3783ef8c321b40fd89088e5955cd05bad/fineweb-edu-dedup/train-00193-of-00234.parquet [2024-08-11 19:57:13,265][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2024-08-11 19:57:24,310][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 66d3c9a6-7e72-41be-9ff4-83977d484f23)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/c074f3d3783ef8c321b40fd89088e5955cd05bad/fineweb-edu-dedup/train-00193-of-00234.parquet [2024-08-11 19:57:24,313][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2024-08-11 19:57:36,430][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 1856b455-849b-45df-b1c0-271375bee1dd)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/c074f3d3783ef8c321b40fd89088e5955cd05bad/fineweb-edu-dedup/train-00193-of-00234.parquet [2024-08-11 19:57:36,433][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2024-08-11 20:01:49,199][Main][INFO] - [train] Step 57100 out of 80000 | Loss --> 1.788 | Grad_l2 --> 0.307 | Weights_l2 --> 9089.959 | Lr --> 0.002 | Seconds_per_step --> 5.787 | [2024-08-11 20:05:56,117][Main][INFO] - [train] Step 57150 out of 80000 | Loss --> 1.792 | Grad_l2 --> 0.308 | Weights_l2 --> 9089.859 | Lr --> 0.002 | Seconds_per_step --> 4.938 | [2024-08-11 20:09:54,672][Main][INFO] - [train] Step 57200 out of 80000 | Loss --> 1.787 | Grad_l2 --> 0.305 | Weights_l2 --> 9089.765 | Lr --> 0.002 | Seconds_per_step --> 4.771 | [2024-08-11 20:13:52,764][Main][INFO] - [train] Step 57250 out of 80000 | Loss --> 1.804 | Grad_l2 --> 0.307 | Weights_l2 --> 9089.666 | Lr --> 0.002 | Seconds_per_step --> 4.762 | [2024-08-11 20:17:56,117][Main][INFO] - [train] Step 57300 out of 80000 | Loss --> 1.790 | Grad_l2 --> 0.308 | Weights_l2 --> 9089.561 | Lr --> 0.002 | Seconds_per_step --> 4.867 | [2024-08-11 20:21:39,065][Main][INFO] - [train] Step 57350 out of 80000 | Loss --> 1.788 | Grad_l2 --> 0.307 | Weights_l2 --> 9089.458 | Lr --> 0.002 | Seconds_per_step --> 4.459 | [2024-08-11 20:25:23,468][Main][INFO] - [train] Step 57400 out of 80000 | Loss --> 1.790 | Grad_l2 --> 0.306 | Weights_l2 --> 9089.348 | Lr --> 0.002 | Seconds_per_step --> 4.488 | [2024-08-11 20:29:16,922][Main][INFO] - [train] Step 57450 out of 80000 | Loss --> 1.790 | Grad_l2 --> 0.308 | Weights_l2 --> 9089.251 | Lr --> 0.002 | Seconds_per_step --> 4.669 | [2024-08-11 20:33:07,082][Main][INFO] - [train] Step 57500 out of 80000 | Loss --> 1.788 | Grad_l2 --> 0.307 | Weights_l2 --> 9089.152 | Lr --> 0.002 | Seconds_per_step --> 4.603 | [2024-08-11 20:36:55,672][Main][INFO] - [train] Step 57550 out of 80000 | Loss --> 1.784 | Grad_l2 --> 0.307 | Weights_l2 --> 9089.054 | Lr --> 0.002 | Seconds_per_step --> 4.572 | [2024-08-11 20:40:43,035][Main][INFO] - [train] Step 57600 out of 80000 | Loss --> 1.782 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.953 | Lr --> 0.002 | Seconds_per_step --> 4.547 | [2024-08-11 20:44:33,501][Main][INFO] - [train] Step 57650 out of 80000 | Loss --> 1.786 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.842 | Lr --> 0.002 | Seconds_per_step --> 4.609 | [2024-08-11 20:48:25,676][Main][INFO] - [train] Step 57700 out of 80000 | Loss --> 1.779 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.733 | Lr --> 0.002 | Seconds_per_step --> 4.643 | [2024-08-11 20:52:15,588][Main][INFO] - [train] Step 57750 out of 80000 | Loss --> 1.781 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.639 | Lr --> 0.002 | Seconds_per_step --> 4.598 | [2024-08-11 20:56:06,157][Main][INFO] - [train] Step 57800 out of 80000 | Loss --> 1.778 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.536 | Lr --> 0.002 | Seconds_per_step --> 4.611 | [2024-08-11 20:59:53,337][Main][INFO] - [train] Step 57850 out of 80000 | Loss --> 1.776 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.436 | Lr --> 0.002 | Seconds_per_step --> 4.544 | [2024-08-11 21:03:44,489][Main][INFO] - [train] Step 57900 out of 80000 | Loss --> 1.778 | Grad_l2 --> 0.309 | Weights_l2 --> 9088.328 | Lr --> 0.002 | Seconds_per_step --> 4.623 | [2024-08-11 21:07:36,703][Main][INFO] - [train] Step 57950 out of 80000 | Loss --> 1.780 | Grad_l2 --> 0.307 | Weights_l2 --> 9088.214 | Lr --> 0.002 | Seconds_per_step --> 4.644 | [2024-08-11 21:11:29,888][Main][INFO] - [train] Step 58000 out of 80000 | Loss --> 1.781 | Grad_l2 --> 0.308 | Weights_l2 --> 9088.110 | Lr --> 0.002 | Seconds_per_step --> 4.664 | [2024-08-11 21:15:15,006][Main][INFO] - [train] Step 58050 out of 80000 | Loss --> 1.767 | Grad_l2 --> 0.309 | Weights_l2 --> 9088.006 | Lr --> 0.002 | Seconds_per_step --> 4.502 | [2024-08-11 21:19:01,376][Main][INFO] - [train] Step 58100 out of 80000 | Loss --> 1.774 | Grad_l2 --> 0.308 | Weights_l2 --> 9087.903 | Lr --> 0.002 | Seconds_per_step --> 4.527 | [2024-08-11 21:22:51,140][Main][INFO] - [train] Step 58150 out of 80000 | Loss --> 1.777 | Grad_l2 --> 0.309 | Weights_l2 --> 9087.793 | Lr --> 0.002 | Seconds_per_step --> 4.595 | [2024-08-11 21:26:35,859][Main][INFO] - [train] Step 58200 out of 80000 | Loss --> 1.775 | Grad_l2 --> 0.308 | Weights_l2 --> 9087.692 | Lr --> 0.002 | Seconds_per_step --> 4.494 | [2024-08-11 21:30:24,002][Main][INFO] - [train] Step 58250 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.309 | Weights_l2 --> 9087.588 | Lr --> 0.002 | Seconds_per_step --> 4.563 | [2024-08-11 21:34:15,810][Main][INFO] - [train] Step 58300 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.308 | Weights_l2 --> 9087.486 | Lr --> 0.002 | Seconds_per_step --> 4.636 | [2024-08-11 21:38:04,254][Main][INFO] - [train] Step 58350 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.309 | Weights_l2 --> 9087.387 | Lr --> 0.002 | Seconds_per_step --> 4.569 | [2024-08-11 21:41:45,046][Main][INFO] - [train] Step 58400 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.309 | Weights_l2 --> 9087.285 | Lr --> 0.002 | Seconds_per_step --> 4.416 | [2024-08-11 21:45:29,763][Main][INFO] - [train] Step 58450 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.308 | Weights_l2 --> 9087.180 | Lr --> 0.002 | Seconds_per_step --> 4.494 | [2024-08-11 21:49:16,119][Main][INFO] - [train] Step 58500 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.308 | Weights_l2 --> 9087.067 | Lr --> 0.002 | Seconds_per_step --> 4.527 | [2024-08-11 21:52:58,696][Main][INFO] - [train] Step 58550 out of 80000 | Loss --> 1.766 | Grad_l2 --> 0.308 | Weights_l2 --> 9086.963 | Lr --> 0.002 | Seconds_per_step --> 4.452 | [2024-08-11 21:56:46,334][Main][INFO] - [train] Step 58600 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.310 | Weights_l2 --> 9086.868 | Lr --> 0.002 | Seconds_per_step --> 4.553 | [2024-08-11 22:00:27,399][Main][INFO] - [train] Step 58650 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.310 | Weights_l2 --> 9086.770 | Lr --> 0.002 | Seconds_per_step --> 4.421 | [2024-08-11 22:04:12,722][Main][INFO] - [train] Step 58700 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.307 | Weights_l2 --> 9086.661 | Lr --> 0.002 | Seconds_per_step --> 4.506 | [2024-08-11 22:08:00,160][Main][INFO] - [train] Step 58750 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.308 | Weights_l2 --> 9086.563 | Lr --> 0.002 | Seconds_per_step --> 4.549 | [2024-08-11 22:11:44,169][Main][INFO] - [train] Step 58800 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.309 | Weights_l2 --> 9086.458 | Lr --> 0.002 | Seconds_per_step --> 4.480 | [2024-08-11 22:15:28,355][Main][INFO] - [train] Step 58850 out of 80000 | Loss --> 1.743 | Grad_l2 --> 0.307 | Weights_l2 --> 9086.355 | Lr --> 0.002 | Seconds_per_step --> 4.484 | [2024-08-11 22:19:13,149][Main][INFO] - [train] Step 58900 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.308 | Weights_l2 --> 9086.253 | Lr --> 0.002 | Seconds_per_step --> 4.496 | [2024-08-11 22:22:54,103][Main][INFO] - [train] Step 58950 out of 80000 | Loss --> 1.743 | Grad_l2 --> 0.308 | Weights_l2 --> 9086.151 | Lr --> 0.002 | Seconds_per_step --> 4.419 | [2024-08-11 22:26:42,100][Main][INFO] - [train] Step 59000 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.308 | Weights_l2 --> 9086.051 | Lr --> 0.002 | Seconds_per_step --> 4.560 | [2024-08-11 22:30:30,714][Main][INFO] - [train] Step 59050 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.308 | Weights_l2 --> 9085.948 | Lr --> 0.002 | Seconds_per_step --> 4.572 | [2024-08-11 22:34:12,979][Main][INFO] - [train] Step 59100 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.310 | Weights_l2 --> 9085.851 | Lr --> 0.002 | Seconds_per_step --> 4.445 | [2024-08-11 22:38:00,619][Main][INFO] - [train] Step 59150 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.308 | Weights_l2 --> 9085.755 | Lr --> 0.002 | Seconds_per_step --> 4.553 | [2024-08-11 22:41:41,913][Main][INFO] - [train] Step 59200 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.310 | Weights_l2 --> 9085.647 | Lr --> 0.002 | Seconds_per_step --> 4.426 | [2024-08-11 22:45:34,811][Main][INFO] - [train] Step 59250 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.310 | Weights_l2 --> 9085.551 | Lr --> 0.002 | Seconds_per_step --> 4.658 | [2024-08-11 22:49:19,551][Main][INFO] - [train] Step 59300 out of 80000 | Loss --> 1.753 | Grad_l2 --> 0.309 | Weights_l2 --> 9085.452 | Lr --> 0.002 | Seconds_per_step --> 4.495 | [2024-08-11 22:53:00,772][Main][INFO] - [train] Step 59350 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.311 | Weights_l2 --> 9085.363 | Lr --> 0.002 | Seconds_per_step --> 4.424 | [2024-08-11 22:56:45,139][Main][INFO] - [train] Step 59400 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.311 | Weights_l2 --> 9085.266 | Lr --> 0.002 | Seconds_per_step --> 4.487 | [2024-08-11 23:00:34,173][Main][INFO] - [train] Step 59450 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.311 | Weights_l2 --> 9085.158 | Lr --> 0.002 | Seconds_per_step --> 4.581 | [2024-08-11 23:04:21,635][Main][INFO] - [train] Step 59500 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.310 | Weights_l2 --> 9085.065 | Lr --> 0.002 | Seconds_per_step --> 4.549 | [2024-08-11 23:08:03,486][Main][INFO] - [train] Step 59550 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.310 | Weights_l2 --> 9084.969 | Lr --> 0.002 | Seconds_per_step --> 4.437 | [2024-08-11 23:11:45,006][Main][INFO] - [train] Step 59600 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.313 | Weights_l2 --> 9084.871 | Lr --> 0.002 | Seconds_per_step --> 4.430 | [2024-08-11 23:15:23,509][Main][INFO] - [train] Step 59650 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.777 | Lr --> 0.002 | Seconds_per_step --> 4.370 | [2024-08-11 23:19:01,925][Main][INFO] - [train] Step 59700 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.680 | Lr --> 0.002 | Seconds_per_step --> 4.368 | [2024-08-11 23:22:43,911][Main][INFO] - [train] Step 59750 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.580 | Lr --> 0.002 | Seconds_per_step --> 4.440 | [2024-08-11 23:26:25,067][Main][INFO] - [train] Step 59800 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.489 | Lr --> 0.002 | Seconds_per_step --> 4.423 | [2024-08-11 23:30:03,875][Main][INFO] - [train] Step 59850 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.392 | Lr --> 0.002 | Seconds_per_step --> 4.376 | [2024-08-11 23:33:42,430][Main][INFO] - [train] Step 59900 out of 80000 | Loss --> 1.761 | Grad_l2 --> 0.312 | Weights_l2 --> 9084.295 | Lr --> 0.002 | Seconds_per_step --> 4.371 | [2024-08-11 23:37:30,256][Main][INFO] - [train] Step 59950 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.313 | Weights_l2 --> 9084.198 | Lr --> 0.002 | Seconds_per_step --> 4.556 | [2024-08-11 23:41:15,929][Main][INFO] - [train] Step 60000 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.311 | Weights_l2 --> 9084.104 | Lr --> 0.002 | Seconds_per_step --> 4.513 | [2024-08-11 23:41:15,929][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-60000 [2024-08-11 23:41:15,933][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-11 23:41:18,954][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-60000/model.safetensors [2024-08-11 23:41:22,600][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-60000/optimizer.bin [2024-08-11 23:41:22,600][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-60000/scheduler.bin [2024-08-11 23:41:22,601][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-60000/sampler.bin [2024-08-11 23:41:22,601][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-60000/sampler_1.bin [2024-08-11 23:41:22,602][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-60000/random_states_0.pkl [2024-08-11 23:45:07,108][Main][INFO] - [train] Step 60050 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.312 | Weights_l2 --> 9084.007 | Lr --> 0.002 | Seconds_per_step --> 4.624 | [2024-08-11 23:48:44,602][Main][INFO] - [train] Step 60100 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.310 | Weights_l2 --> 9083.915 | Lr --> 0.002 | Seconds_per_step --> 4.350 | [2024-08-11 23:52:26,043][Main][INFO] - [train] Step 60150 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.312 | Weights_l2 --> 9083.829 | Lr --> 0.001 | Seconds_per_step --> 4.429 | [2024-08-11 23:56:03,737][Main][INFO] - [train] Step 60200 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.311 | Weights_l2 --> 9083.733 | Lr --> 0.001 | Seconds_per_step --> 4.354 | [2024-08-11 23:59:47,660][Main][INFO] - [train] Step 60250 out of 80000 | Loss --> 1.767 | Grad_l2 --> 0.312 | Weights_l2 --> 9083.640 | Lr --> 0.001 | Seconds_per_step --> 4.478 | [2024-08-12 00:03:32,244][Main][INFO] - [train] Step 60300 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.313 | Weights_l2 --> 9083.550 | Lr --> 0.001 | Seconds_per_step --> 4.492 | [2024-08-12 00:07:17,431][Main][INFO] - [train] Step 60350 out of 80000 | Loss --> 1.780 | Grad_l2 --> 0.314 | Weights_l2 --> 9083.451 | Lr --> 0.001 | Seconds_per_step --> 4.504 | [2024-08-12 00:11:01,326][Main][INFO] - [train] Step 60400 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.311 | Weights_l2 --> 9083.362 | Lr --> 0.001 | Seconds_per_step --> 4.478 | [2024-08-12 00:14:45,402][Main][INFO] - [train] Step 60450 out of 80000 | Loss --> 1.779 | Grad_l2 --> 0.313 | Weights_l2 --> 9083.267 | Lr --> 0.001 | Seconds_per_step --> 4.482 | [2024-08-12 00:18:30,537][Main][INFO] - [train] Step 60500 out of 80000 | Loss --> 1.780 | Grad_l2 --> 0.314 | Weights_l2 --> 9083.181 | Lr --> 0.001 | Seconds_per_step --> 4.503 | [2024-08-12 00:22:22,904][Main][INFO] - [train] Step 60550 out of 80000 | Loss --> 1.776 | Grad_l2 --> 0.312 | Weights_l2 --> 9083.093 | Lr --> 0.001 | Seconds_per_step --> 4.647 | [2024-08-12 00:26:07,858][Main][INFO] - [train] Step 60600 out of 80000 | Loss --> 1.779 | Grad_l2 --> 0.315 | Weights_l2 --> 9082.995 | Lr --> 0.001 | Seconds_per_step --> 4.499 | [2024-08-12 00:29:46,792][Main][INFO] - [train] Step 60650 out of 80000 | Loss --> 1.774 | Grad_l2 --> 0.312 | Weights_l2 --> 9082.900 | Lr --> 0.001 | Seconds_per_step --> 4.379 | [2024-08-12 00:33:31,405][Main][INFO] - [train] Step 60700 out of 80000 | Loss --> 1.774 | Grad_l2 --> 0.313 | Weights_l2 --> 9082.805 | Lr --> 0.001 | Seconds_per_step --> 4.492 | [2024-08-12 00:37:13,998][Main][INFO] - [train] Step 60750 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.313 | Weights_l2 --> 9082.715 | Lr --> 0.001 | Seconds_per_step --> 4.452 | [2024-08-12 00:40:53,937][Main][INFO] - [train] Step 60800 out of 80000 | Loss --> 1.766 | Grad_l2 --> 0.314 | Weights_l2 --> 9082.626 | Lr --> 0.001 | Seconds_per_step --> 4.399 | [2024-08-12 00:44:29,587][Main][INFO] - [train] Step 60850 out of 80000 | Loss --> 1.773 | Grad_l2 --> 0.315 | Weights_l2 --> 9082.535 | Lr --> 0.001 | Seconds_per_step --> 4.313 | [2024-08-12 00:48:07,873][Main][INFO] - [train] Step 60900 out of 80000 | Loss --> 1.778 | Grad_l2 --> 0.315 | Weights_l2 --> 9082.441 | Lr --> 0.001 | Seconds_per_step --> 4.366 | [2024-08-12 00:51:44,166][Main][INFO] - [train] Step 60950 out of 80000 | Loss --> 1.784 | Grad_l2 --> 0.314 | Weights_l2 --> 9082.349 | Lr --> 0.001 | Seconds_per_step --> 4.326 | [2024-08-12 00:55:24,836][Main][INFO] - [train] Step 61000 out of 80000 | Loss --> 1.775 | Grad_l2 --> 0.313 | Weights_l2 --> 9082.260 | Lr --> 0.001 | Seconds_per_step --> 4.413 | [2024-08-12 00:59:05,951][Main][INFO] - [train] Step 61050 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.314 | Weights_l2 --> 9082.170 | Lr --> 0.001 | Seconds_per_step --> 4.422 | [2024-08-12 01:02:44,096][Main][INFO] - [train] Step 61100 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.315 | Weights_l2 --> 9082.077 | Lr --> 0.001 | Seconds_per_step --> 4.363 | [2024-08-12 01:06:23,695][Main][INFO] - [train] Step 61150 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.315 | Weights_l2 --> 9081.988 | Lr --> 0.001 | Seconds_per_step --> 4.392 | [2024-08-12 01:10:01,742][Main][INFO] - [train] Step 61200 out of 80000 | Loss --> 1.769 | Grad_l2 --> 0.315 | Weights_l2 --> 9081.898 | Lr --> 0.001 | Seconds_per_step --> 4.361 | [2024-08-12 01:13:39,844][Main][INFO] - [train] Step 61250 out of 80000 | Loss --> 1.767 | Grad_l2 --> 0.315 | Weights_l2 --> 9081.809 | Lr --> 0.001 | Seconds_per_step --> 4.362 | [2024-08-12 01:17:15,187][Main][INFO] - [train] Step 61300 out of 80000 | Loss --> 1.767 | Grad_l2 --> 0.316 | Weights_l2 --> 9081.720 | Lr --> 0.001 | Seconds_per_step --> 4.307 | [2024-08-12 01:20:53,488][Main][INFO] - [train] Step 61350 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.318 | Weights_l2 --> 9081.631 | Lr --> 0.001 | Seconds_per_step --> 4.366 | [2024-08-12 01:24:34,284][Main][INFO] - [train] Step 61400 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.317 | Weights_l2 --> 9081.538 | Lr --> 0.001 | Seconds_per_step --> 4.416 | [2024-08-12 01:28:12,531][Main][INFO] - [train] Step 61450 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.317 | Weights_l2 --> 9081.448 | Lr --> 0.001 | Seconds_per_step --> 4.365 | [2024-08-12 01:31:52,508][Main][INFO] - [train] Step 61500 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.314 | Weights_l2 --> 9081.354 | Lr --> 0.001 | Seconds_per_step --> 4.400 | [2024-08-12 01:35:34,640][Main][INFO] - [train] Step 61550 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.314 | Weights_l2 --> 9081.260 | Lr --> 0.001 | Seconds_per_step --> 4.443 | [2024-08-12 01:39:17,817][Main][INFO] - [train] Step 61600 out of 80000 | Loss --> 1.766 | Grad_l2 --> 0.313 | Weights_l2 --> 9081.169 | Lr --> 0.001 | Seconds_per_step --> 4.464 | [2024-08-12 01:42:56,472][Main][INFO] - [train] Step 61650 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.316 | Weights_l2 --> 9081.077 | Lr --> 0.001 | Seconds_per_step --> 4.373 | [2024-08-12 01:46:34,161][Main][INFO] - [train] Step 61700 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.317 | Weights_l2 --> 9080.988 | Lr --> 0.001 | Seconds_per_step --> 4.354 | [2024-08-12 01:50:15,341][Main][INFO] - [train] Step 61750 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.315 | Weights_l2 --> 9080.903 | Lr --> 0.001 | Seconds_per_step --> 4.424 | [2024-08-12 01:53:56,286][Main][INFO] - [train] Step 61800 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.316 | Weights_l2 --> 9080.810 | Lr --> 0.001 | Seconds_per_step --> 4.419 | [2024-08-12 01:57:35,453][Main][INFO] - [train] Step 61850 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.317 | Weights_l2 --> 9080.725 | Lr --> 0.001 | Seconds_per_step --> 4.383 | [2024-08-12 02:01:14,106][Main][INFO] - [train] Step 61900 out of 80000 | Loss --> 1.765 | Grad_l2 --> 0.316 | Weights_l2 --> 9080.646 | Lr --> 0.001 | Seconds_per_step --> 4.373 | [2024-08-12 02:04:55,693][Main][INFO] - [train] Step 61950 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.316 | Weights_l2 --> 9080.552 | Lr --> 0.001 | Seconds_per_step --> 4.432 | [2024-08-12 02:08:35,956][Main][INFO] - [train] Step 62000 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.317 | Weights_l2 --> 9080.465 | Lr --> 0.001 | Seconds_per_step --> 4.405 | [2024-08-12 02:12:08,062][Main][INFO] - [train] Step 62050 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.316 | Weights_l2 --> 9080.379 | Lr --> 0.001 | Seconds_per_step --> 4.242 | [2024-08-12 02:15:46,511][Main][INFO] - [train] Step 62100 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.316 | Weights_l2 --> 9080.297 | Lr --> 0.001 | Seconds_per_step --> 4.369 | [2024-08-12 02:19:22,962][Main][INFO] - [train] Step 62150 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.318 | Weights_l2 --> 9080.214 | Lr --> 0.001 | Seconds_per_step --> 4.329 | [2024-08-12 02:22:58,963][Main][INFO] - [train] Step 62200 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.317 | Weights_l2 --> 9080.127 | Lr --> 0.001 | Seconds_per_step --> 4.320 | [2024-08-12 02:26:37,400][Main][INFO] - [train] Step 62250 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.317 | Weights_l2 --> 9080.042 | Lr --> 0.001 | Seconds_per_step --> 4.369 | [2024-08-12 02:30:16,727][Main][INFO] - [train] Step 62300 out of 80000 | Loss --> 1.772 | Grad_l2 --> 0.319 | Weights_l2 --> 9079.958 | Lr --> 0.001 | Seconds_per_step --> 4.387 | [2024-08-12 02:33:56,219][Main][INFO] - [train] Step 62350 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.316 | Weights_l2 --> 9079.868 | Lr --> 0.001 | Seconds_per_step --> 4.390 | [2024-08-12 02:37:37,466][Main][INFO] - [train] Step 62400 out of 80000 | Loss --> 1.758 | Grad_l2 --> 0.318 | Weights_l2 --> 9079.775 | Lr --> 0.001 | Seconds_per_step --> 4.425 | [2024-08-12 02:41:13,861][Main][INFO] - [train] Step 62450 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.318 | Weights_l2 --> 9079.691 | Lr --> 0.001 | Seconds_per_step --> 4.328 | [2024-08-12 02:44:48,098][Main][INFO] - [train] Step 62500 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.316 | Weights_l2 --> 9079.610 | Lr --> 0.001 | Seconds_per_step --> 4.285 | [2024-08-12 02:48:27,232][Main][INFO] - [train] Step 62550 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.317 | Weights_l2 --> 9079.519 | Lr --> 0.001 | Seconds_per_step --> 4.383 | [2024-08-12 02:52:03,675][Main][INFO] - [train] Step 62600 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.317 | Weights_l2 --> 9079.435 | Lr --> 0.001 | Seconds_per_step --> 4.329 | [2024-08-12 02:55:36,853][Main][INFO] - [train] Step 62650 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.318 | Weights_l2 --> 9079.354 | Lr --> 0.001 | Seconds_per_step --> 4.264 | [2024-08-12 02:59:13,236][Main][INFO] - [train] Step 62700 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.317 | Weights_l2 --> 9079.273 | Lr --> 0.001 | Seconds_per_step --> 4.328 | [2024-08-12 03:02:51,873][Main][INFO] - [train] Step 62750 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.316 | Weights_l2 --> 9079.186 | Lr --> 0.001 | Seconds_per_step --> 4.373 | [2024-08-12 03:06:17,555][Main][INFO] - [train] Step 62800 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.316 | Weights_l2 --> 9079.101 | Lr --> 0.001 | Seconds_per_step --> 4.114 | [2024-08-12 03:09:51,048][Main][INFO] - [train] Step 62850 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.317 | Weights_l2 --> 9079.013 | Lr --> 0.001 | Seconds_per_step --> 4.270 | [2024-08-12 03:13:26,624][Main][INFO] - [train] Step 62900 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.929 | Lr --> 0.001 | Seconds_per_step --> 4.311 | [2024-08-12 03:16:59,358][Main][INFO] - [train] Step 62950 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.316 | Weights_l2 --> 9078.845 | Lr --> 0.001 | Seconds_per_step --> 4.255 | [2024-08-12 03:20:28,856][Main][INFO] - [train] Step 63000 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.761 | Lr --> 0.001 | Seconds_per_step --> 4.190 | [2024-08-12 03:24:04,310][Main][INFO] - [train] Step 63050 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.679 | Lr --> 0.001 | Seconds_per_step --> 4.309 | [2024-08-12 03:27:41,598][Main][INFO] - [train] Step 63100 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.319 | Weights_l2 --> 9078.598 | Lr --> 0.001 | Seconds_per_step --> 4.346 | [2024-08-12 03:31:16,710][Main][INFO] - [train] Step 63150 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.515 | Lr --> 0.001 | Seconds_per_step --> 4.302 | [2024-08-12 03:34:45,673][Main][INFO] - [train] Step 63200 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.323 | Weights_l2 --> 9078.436 | Lr --> 0.001 | Seconds_per_step --> 4.179 | [2024-08-12 03:38:25,800][Main][INFO] - [train] Step 63250 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.355 | Lr --> 0.001 | Seconds_per_step --> 4.403 | [2024-08-12 03:41:57,982][Main][INFO] - [train] Step 63300 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.321 | Weights_l2 --> 9078.275 | Lr --> 0.001 | Seconds_per_step --> 4.244 | [2024-08-12 03:45:31,669][Main][INFO] - [train] Step 63350 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.316 | Weights_l2 --> 9078.195 | Lr --> 0.001 | Seconds_per_step --> 4.274 | [2024-08-12 03:48:59,956][Main][INFO] - [train] Step 63400 out of 80000 | Loss --> 1.758 | Grad_l2 --> 0.318 | Weights_l2 --> 9078.112 | Lr --> 0.001 | Seconds_per_step --> 4.166 | [2024-08-12 03:52:32,360][Main][INFO] - [train] Step 63450 out of 80000 | Loss --> 1.767 | Grad_l2 --> 0.321 | Weights_l2 --> 9078.029 | Lr --> 0.001 | Seconds_per_step --> 4.248 | [2024-08-12 03:56:03,506][Main][INFO] - [train] Step 63500 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.322 | Weights_l2 --> 9077.949 | Lr --> 0.001 | Seconds_per_step --> 4.223 | [2024-08-12 03:59:28,970][Main][INFO] - [train] Step 63550 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.320 | Weights_l2 --> 9077.869 | Lr --> 0.001 | Seconds_per_step --> 4.109 | [2024-08-12 04:02:55,079][Main][INFO] - [train] Step 63600 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.317 | Weights_l2 --> 9077.787 | Lr --> 0.001 | Seconds_per_step --> 4.122 | [2024-08-12 04:06:21,350][Main][INFO] - [train] Step 63650 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.321 | Weights_l2 --> 9077.704 | Lr --> 0.001 | Seconds_per_step --> 4.125 | [2024-08-12 04:09:46,832][Main][INFO] - [train] Step 63700 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.620 | Lr --> 0.001 | Seconds_per_step --> 4.110 | [2024-08-12 04:13:18,253][Main][INFO] - [train] Step 63750 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.542 | Lr --> 0.001 | Seconds_per_step --> 4.228 | [2024-08-12 04:16:46,689][Main][INFO] - [train] Step 63800 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.320 | Weights_l2 --> 9077.465 | Lr --> 0.001 | Seconds_per_step --> 4.169 | [2024-08-12 04:20:17,181][Main][INFO] - [train] Step 63850 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.385 | Lr --> 0.001 | Seconds_per_step --> 4.210 | [2024-08-12 04:23:44,511][Main][INFO] - [train] Step 63900 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.320 | Weights_l2 --> 9077.307 | Lr --> 0.001 | Seconds_per_step --> 4.147 | [2024-08-12 04:27:16,494][Main][INFO] - [train] Step 63950 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.231 | Lr --> 0.001 | Seconds_per_step --> 4.240 | [2024-08-12 04:30:45,204][Main][INFO] - [train] Step 64000 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.153 | Lr --> 0.001 | Seconds_per_step --> 4.174 | [2024-08-12 04:34:13,330][Main][INFO] - [train] Step 64050 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.319 | Weights_l2 --> 9077.076 | Lr --> 0.001 | Seconds_per_step --> 4.163 | [2024-08-12 04:37:45,142][Main][INFO] - [train] Step 64100 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.996 | Lr --> 0.001 | Seconds_per_step --> 4.236 | [2024-08-12 04:41:18,674][Main][INFO] - [train] Step 64150 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.921 | Lr --> 0.001 | Seconds_per_step --> 4.271 | [2024-08-12 04:44:48,178][Main][INFO] - [train] Step 64200 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.843 | Lr --> 0.001 | Seconds_per_step --> 4.190 | [2024-08-12 04:48:19,842][Main][INFO] - [train] Step 64250 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.321 | Weights_l2 --> 9076.764 | Lr --> 0.001 | Seconds_per_step --> 4.233 | [2024-08-12 04:51:52,645][Main][INFO] - [train] Step 64300 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.690 | Lr --> 0.001 | Seconds_per_step --> 4.256 | [2024-08-12 04:55:21,007][Main][INFO] - [train] Step 64350 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.321 | Weights_l2 --> 9076.614 | Lr --> 0.001 | Seconds_per_step --> 4.167 | [2024-08-12 04:58:49,316][Main][INFO] - [train] Step 64400 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.320 | Weights_l2 --> 9076.537 | Lr --> 0.001 | Seconds_per_step --> 4.166 | [2024-08-12 05:02:19,749][Main][INFO] - [train] Step 64450 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.321 | Weights_l2 --> 9076.460 | Lr --> 0.001 | Seconds_per_step --> 4.209 | [2024-08-12 05:05:46,590][Main][INFO] - [train] Step 64500 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.389 | Lr --> 0.001 | Seconds_per_step --> 4.137 | [2024-08-12 05:09:15,338][Main][INFO] - [train] Step 64550 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.323 | Weights_l2 --> 9076.316 | Lr --> 0.001 | Seconds_per_step --> 4.175 | [2024-08-12 05:12:48,312][Main][INFO] - [train] Step 64600 out of 80000 | Loss --> 1.734 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.241 | Lr --> 0.001 | Seconds_per_step --> 4.259 | [2024-08-12 05:16:15,861][Main][INFO] - [train] Step 64650 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.324 | Weights_l2 --> 9076.163 | Lr --> 0.001 | Seconds_per_step --> 4.151 | [2024-08-12 05:19:47,700][Main][INFO] - [train] Step 64700 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.322 | Weights_l2 --> 9076.093 | Lr --> 0.001 | Seconds_per_step --> 4.237 | [2024-08-12 05:23:19,101][Main][INFO] - [train] Step 64750 out of 80000 | Loss --> 1.732 | Grad_l2 --> 0.324 | Weights_l2 --> 9076.017 | Lr --> 0.001 | Seconds_per_step --> 4.228 | [2024-08-12 05:26:53,820][Main][INFO] - [train] Step 64800 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.323 | Weights_l2 --> 9075.943 | Lr --> 0.001 | Seconds_per_step --> 4.294 | [2024-08-12 05:30:25,765][Main][INFO] - [train] Step 64850 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.323 | Weights_l2 --> 9075.868 | Lr --> 0.001 | Seconds_per_step --> 4.239 | [2024-08-12 05:33:57,856][Main][INFO] - [train] Step 64900 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.322 | Weights_l2 --> 9075.794 | Lr --> 0.001 | Seconds_per_step --> 4.242 | [2024-08-12 05:37:34,949][Main][INFO] - [train] Step 64950 out of 80000 | Loss --> 1.727 | Grad_l2 --> 0.324 | Weights_l2 --> 9075.723 | Lr --> 0.001 | Seconds_per_step --> 4.342 | [2024-08-12 05:41:34,479][Main][INFO] - [train] Step 65000 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.321 | Weights_l2 --> 9075.650 | Lr --> 0.001 | Seconds_per_step --> 4.791 | [2024-08-12 05:41:34,479][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-65000 [2024-08-12 05:41:34,483][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-12 05:41:37,445][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-65000/model.safetensors [2024-08-12 05:41:45,138][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-65000/optimizer.bin [2024-08-12 05:41:45,138][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-65000/scheduler.bin [2024-08-12 05:41:45,138][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-65000/sampler.bin [2024-08-12 05:41:45,138][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-65000/sampler_1.bin [2024-08-12 05:41:45,139][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-65000/random_states_0.pkl [2024-08-12 05:45:35,690][Main][INFO] - [train] Step 65050 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.322 | Weights_l2 --> 9075.576 | Lr --> 0.001 | Seconds_per_step --> 4.824 | [2024-08-12 05:49:36,445][Main][INFO] - [train] Step 65100 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.323 | Weights_l2 --> 9075.504 | Lr --> 0.001 | Seconds_per_step --> 4.815 | [2024-08-12 05:53:48,172][Main][INFO] - [train] Step 65150 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.325 | Weights_l2 --> 9075.434 | Lr --> 0.001 | Seconds_per_step --> 5.035 | [2024-08-12 05:57:50,665][Main][INFO] - [train] Step 65200 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.325 | Weights_l2 --> 9075.358 | Lr --> 0.001 | Seconds_per_step --> 4.850 | [2024-08-12 06:01:51,474][Main][INFO] - [train] Step 65250 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.324 | Weights_l2 --> 9075.282 | Lr --> 0.001 | Seconds_per_step --> 4.816 | [2024-08-12 06:05:53,194][Main][INFO] - [train] Step 65300 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.324 | Weights_l2 --> 9075.209 | Lr --> 0.001 | Seconds_per_step --> 4.834 | [2024-08-12 06:10:02,625][Main][INFO] - [train] Step 65350 out of 80000 | Loss --> 1.720 | Grad_l2 --> 0.322 | Weights_l2 --> 9075.137 | Lr --> 0.001 | Seconds_per_step --> 4.989 | [2024-08-12 06:14:02,960][Main][INFO] - [train] Step 65400 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.324 | Weights_l2 --> 9075.064 | Lr --> 0.001 | Seconds_per_step --> 4.807 | [2024-08-12 06:18:05,636][Main][INFO] - [train] Step 65450 out of 80000 | Loss --> 1.728 | Grad_l2 --> 0.324 | Weights_l2 --> 9074.992 | Lr --> 0.001 | Seconds_per_step --> 4.854 | [2024-08-12 06:22:11,143][Main][INFO] - [train] Step 65500 out of 80000 | Loss --> 1.725 | Grad_l2 --> 0.323 | Weights_l2 --> 9074.925 | Lr --> 0.001 | Seconds_per_step --> 4.910 | [2024-08-12 06:26:24,738][Main][INFO] - [train] Step 65550 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.848 | Lr --> 0.001 | Seconds_per_step --> 5.072 | [2024-08-12 06:30:24,331][Main][INFO] - [train] Step 65600 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.775 | Lr --> 0.001 | Seconds_per_step --> 4.792 | [2024-08-12 06:34:30,155][Main][INFO] - [train] Step 65650 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.702 | Lr --> 0.001 | Seconds_per_step --> 4.916 | [2024-08-12 06:38:38,111][Main][INFO] - [train] Step 65700 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.325 | Weights_l2 --> 9074.630 | Lr --> 0.001 | Seconds_per_step --> 4.959 | [2024-08-12 06:42:32,160][Main][INFO] - [train] Step 65750 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.327 | Weights_l2 --> 9074.556 | Lr --> 0.001 | Seconds_per_step --> 4.681 | [2024-08-12 06:46:31,254][Main][INFO] - [train] Step 65800 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.324 | Weights_l2 --> 9074.488 | Lr --> 0.001 | Seconds_per_step --> 4.782 | [2024-08-12 06:50:40,876][Main][INFO] - [train] Step 65850 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.327 | Weights_l2 --> 9074.414 | Lr --> 0.001 | Seconds_per_step --> 4.992 | [2024-08-12 06:54:45,474][Main][INFO] - [train] Step 65900 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.342 | Lr --> 0.001 | Seconds_per_step --> 4.892 | [2024-08-12 06:58:45,288][Main][INFO] - [train] Step 65950 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.268 | Lr --> 0.001 | Seconds_per_step --> 4.796 | [2024-08-12 07:02:45,790][Main][INFO] - [train] Step 66000 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.326 | Weights_l2 --> 9074.203 | Lr --> 0.001 | Seconds_per_step --> 4.810 | [2024-08-12 07:06:49,098][Main][INFO] - [train] Step 66050 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.328 | Weights_l2 --> 9074.128 | Lr --> 0.001 | Seconds_per_step --> 4.866 | [2024-08-12 07:10:28,457][Main][INFO] - [train] Step 66100 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.327 | Weights_l2 --> 9074.061 | Lr --> 0.001 | Seconds_per_step --> 4.387 | [2024-08-12 07:14:06,799][Main][INFO] - [train] Step 66150 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.327 | Weights_l2 --> 9073.990 | Lr --> 0.001 | Seconds_per_step --> 4.367 | [2024-08-12 07:17:45,000][Main][INFO] - [train] Step 66200 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.329 | Weights_l2 --> 9073.915 | Lr --> 0.001 | Seconds_per_step --> 4.364 | [2024-08-12 07:21:25,805][Main][INFO] - [train] Step 66250 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.327 | Weights_l2 --> 9073.847 | Lr --> 0.001 | Seconds_per_step --> 4.416 | [2024-08-12 07:25:07,094][Main][INFO] - [train] Step 66300 out of 80000 | Loss --> 1.761 | Grad_l2 --> 0.328 | Weights_l2 --> 9073.777 | Lr --> 0.001 | Seconds_per_step --> 4.426 | [2024-08-12 07:28:52,170][Main][INFO] - [train] Step 66350 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.329 | Weights_l2 --> 9073.706 | Lr --> 0.001 | Seconds_per_step --> 4.502 | [2024-08-12 07:32:44,959][Main][INFO] - [train] Step 66400 out of 80000 | Loss --> 1.753 | Grad_l2 --> 0.329 | Weights_l2 --> 9073.633 | Lr --> 0.001 | Seconds_per_step --> 4.656 | [2024-08-12 07:36:44,237][Main][INFO] - [train] Step 66450 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.328 | Weights_l2 --> 9073.563 | Lr --> 0.001 | Seconds_per_step --> 4.786 | [2024-08-12 07:40:37,789][Main][INFO] - [train] Step 66500 out of 80000 | Loss --> 1.761 | Grad_l2 --> 0.327 | Weights_l2 --> 9073.497 | Lr --> 0.001 | Seconds_per_step --> 4.671 | [2024-08-12 07:44:38,576][Main][INFO] - [train] Step 66550 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.331 | Weights_l2 --> 9073.425 | Lr --> 0.001 | Seconds_per_step --> 4.816 | [2024-08-12 07:48:46,663][Main][INFO] - [train] Step 66600 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.328 | Weights_l2 --> 9073.357 | Lr --> 0.001 | Seconds_per_step --> 4.962 | [2024-08-12 07:52:46,860][Main][INFO] - [train] Step 66650 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.330 | Weights_l2 --> 9073.285 | Lr --> 0.001 | Seconds_per_step --> 4.804 | [2024-08-12 07:56:36,923][Main][INFO] - [train] Step 66700 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.329 | Weights_l2 --> 9073.215 | Lr --> 0.001 | Seconds_per_step --> 4.601 | [2024-08-12 08:00:18,881][Main][INFO] - [train] Step 66750 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.331 | Weights_l2 --> 9073.141 | Lr --> 0.001 | Seconds_per_step --> 4.439 | [2024-08-12 08:04:03,533][Main][INFO] - [train] Step 66800 out of 80000 | Loss --> 1.769 | Grad_l2 --> 0.330 | Weights_l2 --> 9073.071 | Lr --> 0.001 | Seconds_per_step --> 4.493 | [2024-08-12 08:07:50,500][Main][INFO] - [train] Step 66850 out of 80000 | Loss --> 1.769 | Grad_l2 --> 0.331 | Weights_l2 --> 9073.004 | Lr --> 0.001 | Seconds_per_step --> 4.539 | [2024-08-12 08:11:49,816][Main][INFO] - [train] Step 66900 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.935 | Lr --> 0.001 | Seconds_per_step --> 4.786 | [2024-08-12 08:15:56,432][Main][INFO] - [train] Step 66950 out of 80000 | Loss --> 1.768 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.867 | Lr --> 0.001 | Seconds_per_step --> 4.932 | [2024-08-12 08:20:02,525][Main][INFO] - [train] Step 67000 out of 80000 | Loss --> 1.779 | Grad_l2 --> 0.332 | Weights_l2 --> 9072.797 | Lr --> 0.001 | Seconds_per_step --> 4.922 | [2024-08-12 08:23:53,330][Main][INFO] - [train] Step 67050 out of 80000 | Loss --> 1.771 | Grad_l2 --> 0.333 | Weights_l2 --> 9072.730 | Lr --> 0.001 | Seconds_per_step --> 4.616 | [2024-08-12 08:27:56,587][Main][INFO] - [train] Step 67100 out of 80000 | Loss --> 1.774 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.661 | Lr --> 0.001 | Seconds_per_step --> 4.865 | [2024-08-12 08:32:02,097][Main][INFO] - [train] Step 67150 out of 80000 | Loss --> 1.772 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.592 | Lr --> 0.001 | Seconds_per_step --> 4.910 | [2024-08-12 08:36:03,847][Main][INFO] - [train] Step 67200 out of 80000 | Loss --> 1.774 | Grad_l2 --> 0.332 | Weights_l2 --> 9072.521 | Lr --> 0.001 | Seconds_per_step --> 4.835 | [2024-08-12 08:40:03,755][Main][INFO] - [train] Step 67250 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.457 | Lr --> 0.001 | Seconds_per_step --> 4.798 | [2024-08-12 08:44:12,833][Main][INFO] - [train] Step 67300 out of 80000 | Loss --> 1.769 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.387 | Lr --> 0.001 | Seconds_per_step --> 4.982 | [2024-08-12 08:48:15,824][Main][INFO] - [train] Step 67350 out of 80000 | Loss --> 1.760 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.319 | Lr --> 0.001 | Seconds_per_step --> 4.860 | [2024-08-12 08:52:17,176][Main][INFO] - [train] Step 67400 out of 80000 | Loss --> 1.766 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.248 | Lr --> 0.001 | Seconds_per_step --> 4.827 | [2024-08-12 08:56:26,912][Main][INFO] - [train] Step 67450 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.332 | Weights_l2 --> 9072.181 | Lr --> 0.001 | Seconds_per_step --> 4.995 | [2024-08-12 09:00:28,981][Main][INFO] - [train] Step 67500 out of 80000 | Loss --> 1.772 | Grad_l2 --> 0.331 | Weights_l2 --> 9072.113 | Lr --> 0.001 | Seconds_per_step --> 4.841 | [2024-08-12 09:04:36,172][Main][INFO] - [train] Step 67550 out of 80000 | Loss --> 1.770 | Grad_l2 --> 0.335 | Weights_l2 --> 9072.048 | Lr --> 0.001 | Seconds_per_step --> 4.944 | [2024-08-12 09:08:49,679][Main][INFO] - [train] Step 67600 out of 80000 | Loss --> 1.766 | Grad_l2 --> 0.335 | Weights_l2 --> 9071.978 | Lr --> 0.001 | Seconds_per_step --> 5.070 | [2024-08-12 09:12:58,709][Main][INFO] - [train] Step 67650 out of 80000 | Loss --> 1.764 | Grad_l2 --> 0.331 | Weights_l2 --> 9071.910 | Lr --> 0.001 | Seconds_per_step --> 4.981 | [2024-08-12 09:17:14,413][Main][INFO] - [train] Step 67700 out of 80000 | Loss --> 1.765 | Grad_l2 --> 0.331 | Weights_l2 --> 9071.843 | Lr --> 0.001 | Seconds_per_step --> 5.114 | [2024-08-12 09:21:11,505][Main][INFO] - [train] Step 67750 out of 80000 | Loss --> 1.765 | Grad_l2 --> 0.331 | Weights_l2 --> 9071.774 | Lr --> 0.001 | Seconds_per_step --> 4.742 | [2024-08-12 09:25:15,107][Main][INFO] - [train] Step 67800 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.332 | Weights_l2 --> 9071.709 | Lr --> 0.001 | Seconds_per_step --> 4.872 | [2024-08-12 09:29:20,556][Main][INFO] - [train] Step 67850 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.330 | Weights_l2 --> 9071.643 | Lr --> 0.001 | Seconds_per_step --> 4.909 | [2024-08-12 09:33:24,433][Main][INFO] - [train] Step 67900 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.334 | Weights_l2 --> 9071.575 | Lr --> 0.001 | Seconds_per_step --> 4.878 | [2024-08-12 09:37:21,053][Main][INFO] - [train] Step 67950 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.335 | Weights_l2 --> 9071.510 | Lr --> 0.001 | Seconds_per_step --> 4.732 | [2024-08-12 09:41:30,689][Main][INFO] - [train] Step 68000 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.331 | Weights_l2 --> 9071.446 | Lr --> 0.001 | Seconds_per_step --> 4.993 | [2024-08-12 09:45:33,925][Main][INFO] - [train] Step 68050 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.333 | Weights_l2 --> 9071.382 | Lr --> 0.001 | Seconds_per_step --> 4.865 | [2024-08-12 09:49:32,988][Main][INFO] - [train] Step 68100 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.331 | Weights_l2 --> 9071.320 | Lr --> 0.001 | Seconds_per_step --> 4.781 | [2024-08-12 09:53:35,684][Main][INFO] - [train] Step 68150 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.334 | Weights_l2 --> 9071.255 | Lr --> 0.001 | Seconds_per_step --> 4.854 | [2024-08-12 09:57:46,641][Main][INFO] - [train] Step 68200 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.333 | Weights_l2 --> 9071.196 | Lr --> 0.001 | Seconds_per_step --> 5.019 | [2024-08-12 10:01:44,603][Main][INFO] - [train] Step 68250 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.333 | Weights_l2 --> 9071.133 | Lr --> 0.001 | Seconds_per_step --> 4.759 | [2024-08-12 10:05:41,867][Main][INFO] - [train] Step 68300 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.334 | Weights_l2 --> 9071.075 | Lr --> 0.001 | Seconds_per_step --> 4.745 | [2024-08-12 10:09:51,335][Main][INFO] - [train] Step 68350 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.333 | Weights_l2 --> 9071.016 | Lr --> 0.001 | Seconds_per_step --> 4.989 | [2024-08-12 10:14:03,242][Main][INFO] - [train] Step 68400 out of 80000 | Loss --> 1.762 | Grad_l2 --> 0.334 | Weights_l2 --> 9070.960 | Lr --> 0.001 | Seconds_per_step --> 5.038 | [2024-08-12 10:17:57,549][Main][INFO] - [train] Step 68450 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.902 | Lr --> 0.001 | Seconds_per_step --> 4.686 | [2024-08-12 10:22:04,017][Main][INFO] - [train] Step 68500 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.845 | Lr --> 0.001 | Seconds_per_step --> 4.929 | [2024-08-12 10:26:15,511][Main][INFO] - [train] Step 68550 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.785 | Lr --> 0.001 | Seconds_per_step --> 5.030 | [2024-08-12 10:30:12,459][Main][INFO] - [train] Step 68600 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.334 | Weights_l2 --> 9070.729 | Lr --> 0.001 | Seconds_per_step --> 4.739 | [2024-08-12 10:34:09,711][Main][INFO] - [train] Step 68650 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.674 | Lr --> 0.001 | Seconds_per_step --> 4.745 | [2024-08-12 10:38:15,758][Main][INFO] - [train] Step 68700 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.620 | Lr --> 0.001 | Seconds_per_step --> 4.921 | [2024-08-12 10:42:31,275][Main][INFO] - [train] Step 68750 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.334 | Weights_l2 --> 9070.565 | Lr --> 0.001 | Seconds_per_step --> 5.110 | [2024-08-12 10:46:29,239][Main][INFO] - [train] Step 68800 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.336 | Weights_l2 --> 9070.513 | Lr --> 0.001 | Seconds_per_step --> 4.759 | [2024-08-12 10:50:35,687][Main][INFO] - [train] Step 68850 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.336 | Weights_l2 --> 9070.463 | Lr --> 0.000 | Seconds_per_step --> 4.929 | [2024-08-12 10:54:45,439][Main][INFO] - [train] Step 68900 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.334 | Weights_l2 --> 9070.413 | Lr --> 0.000 | Seconds_per_step --> 4.995 | [2024-08-12 10:58:49,957][Main][INFO] - [train] Step 68950 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.362 | Lr --> 0.000 | Seconds_per_step --> 4.890 | [2024-08-12 11:02:50,584][Main][INFO] - [train] Step 69000 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.333 | Weights_l2 --> 9070.312 | Lr --> 0.000 | Seconds_per_step --> 4.813 | [2024-08-12 11:07:04,656][Main][INFO] - [train] Step 69050 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.336 | Weights_l2 --> 9070.263 | Lr --> 0.000 | Seconds_per_step --> 5.081 | [2024-08-12 11:11:13,715][Main][INFO] - [train] Step 69100 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.336 | Weights_l2 --> 9070.214 | Lr --> 0.000 | Seconds_per_step --> 4.981 | [2024-08-12 11:15:08,470][Main][INFO] - [train] Step 69150 out of 80000 | Loss --> 1.737 | Grad_l2 --> 0.335 | Weights_l2 --> 9070.167 | Lr --> 0.000 | Seconds_per_step --> 4.695 | [2024-08-12 11:19:18,203][Main][INFO] - [train] Step 69200 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.336 | Weights_l2 --> 9070.119 | Lr --> 0.000 | Seconds_per_step --> 4.995 | [2024-08-12 11:23:36,177][Main][INFO] - [train] Step 69250 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.334 | Weights_l2 --> 9070.077 | Lr --> 0.000 | Seconds_per_step --> 5.159 | [2024-08-12 11:27:32,982][Main][INFO] - [train] Step 69300 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.335 | Weights_l2 --> 9070.031 | Lr --> 0.000 | Seconds_per_step --> 4.736 | [2024-08-12 11:31:35,613][Main][INFO] - [train] Step 69350 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.336 | Weights_l2 --> 9069.988 | Lr --> 0.000 | Seconds_per_step --> 4.853 | [2024-08-12 11:35:44,767][Main][INFO] - [train] Step 69400 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.336 | Weights_l2 --> 9069.946 | Lr --> 0.000 | Seconds_per_step --> 4.983 | [2024-08-12 11:39:44,712][Main][INFO] - [train] Step 69450 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.336 | Weights_l2 --> 9069.900 | Lr --> 0.000 | Seconds_per_step --> 4.799 | [2024-08-12 11:43:39,145][Main][INFO] - [train] Step 69500 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.338 | Weights_l2 --> 9069.857 | Lr --> 0.000 | Seconds_per_step --> 4.689 | [2024-08-12 11:47:43,420][Main][INFO] - [train] Step 69550 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.815 | Lr --> 0.000 | Seconds_per_step --> 4.885 | [2024-08-12 11:51:55,140][Main][INFO] - [train] Step 69600 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.336 | Weights_l2 --> 9069.774 | Lr --> 0.000 | Seconds_per_step --> 5.034 | [2024-08-12 11:55:50,294][Main][INFO] - [train] Step 69650 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.734 | Lr --> 0.000 | Seconds_per_step --> 4.703 | [2024-08-12 11:59:50,709][Main][INFO] - [train] Step 69700 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.336 | Weights_l2 --> 9069.695 | Lr --> 0.000 | Seconds_per_step --> 4.808 | [2024-08-12 12:03:56,289][Main][INFO] - [train] Step 69750 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.657 | Lr --> 0.000 | Seconds_per_step --> 4.912 | [2024-08-12 12:08:00,668][Main][INFO] - [train] Step 69800 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.338 | Weights_l2 --> 9069.619 | Lr --> 0.000 | Seconds_per_step --> 4.888 | [2024-08-12 12:11:55,491][Main][INFO] - [train] Step 69850 out of 80000 | Loss --> 1.743 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.580 | Lr --> 0.000 | Seconds_per_step --> 4.696 | [2024-08-12 12:15:58,225][Main][INFO] - [train] Step 69900 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.544 | Lr --> 0.000 | Seconds_per_step --> 4.855 | [2024-08-12 12:20:07,222][Main][INFO] - [train] Step 69950 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.507 | Lr --> 0.000 | Seconds_per_step --> 4.980 | [2024-08-12 12:24:09,812][Main][INFO] - [train] Step 70000 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.338 | Weights_l2 --> 9069.472 | Lr --> 0.000 | Seconds_per_step --> 4.852 | [2024-08-12 12:24:09,812][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-70000 [2024-08-12 12:24:09,816][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-12 12:24:13,019][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-70000/model.safetensors [2024-08-12 12:24:16,995][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-70000/optimizer.bin [2024-08-12 12:24:16,996][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-70000/scheduler.bin [2024-08-12 12:24:16,996][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-70000/sampler.bin [2024-08-12 12:24:16,996][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-70000/sampler_1.bin [2024-08-12 12:24:16,997][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-70000/random_states_0.pkl [2024-08-12 12:28:17,924][Main][INFO] - [train] Step 70050 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.436 | Lr --> 0.000 | Seconds_per_step --> 4.962 | [2024-08-12 12:32:24,318][Main][INFO] - [train] Step 70100 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.338 | Weights_l2 --> 9069.403 | Lr --> 0.000 | Seconds_per_step --> 4.928 | [2024-08-12 12:36:36,478][Main][INFO] - [train] Step 70150 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.369 | Lr --> 0.000 | Seconds_per_step --> 5.043 | [2024-08-12 12:40:33,276][Main][INFO] - [train] Step 70200 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.337 | Weights_l2 --> 9069.335 | Lr --> 0.000 | Seconds_per_step --> 4.736 | [2024-08-12 12:44:33,787][Main][INFO] - [train] Step 70250 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.302 | Lr --> 0.000 | Seconds_per_step --> 4.810 | [2024-08-12 12:48:49,628][Main][INFO] - [train] Step 70300 out of 80000 | Loss --> 1.737 | Grad_l2 --> 0.340 | Weights_l2 --> 9069.272 | Lr --> 0.000 | Seconds_per_step --> 5.117 | [2024-08-12 12:52:50,330][Main][INFO] - [train] Step 70350 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.241 | Lr --> 0.000 | Seconds_per_step --> 4.814 | [2024-08-12 12:56:47,628][Main][INFO] - [train] Step 70400 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.340 | Weights_l2 --> 9069.210 | Lr --> 0.000 | Seconds_per_step --> 4.746 | [2024-08-12 13:00:53,896][Main][INFO] - [train] Step 70450 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.181 | Lr --> 0.000 | Seconds_per_step --> 4.925 | [2024-08-12 13:05:13,889][Main][INFO] - [train] Step 70500 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.340 | Weights_l2 --> 9069.153 | Lr --> 0.000 | Seconds_per_step --> 5.200 | [2024-08-12 13:09:07,510][Main][INFO] - [train] Step 70550 out of 80000 | Loss --> 1.732 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.124 | Lr --> 0.000 | Seconds_per_step --> 4.672 | [2024-08-12 13:13:07,689][Main][INFO] - [train] Step 70600 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.340 | Weights_l2 --> 9069.095 | Lr --> 0.000 | Seconds_per_step --> 4.804 | [2024-08-12 13:17:20,202][Main][INFO] - [train] Step 70650 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.339 | Weights_l2 --> 9069.067 | Lr --> 0.000 | Seconds_per_step --> 5.050 | [2024-08-12 13:21:31,602][Main][INFO] - [train] Step 70700 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.342 | Weights_l2 --> 9069.040 | Lr --> 0.000 | Seconds_per_step --> 5.028 | [2024-08-12 13:25:28,721][Main][INFO] - [train] Step 70750 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.341 | Weights_l2 --> 9069.015 | Lr --> 0.000 | Seconds_per_step --> 4.742 | [2024-08-12 13:29:41,170][Main][INFO] - [train] Step 70800 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.990 | Lr --> 0.000 | Seconds_per_step --> 5.049 | [2024-08-12 13:33:45,475][Main][INFO] - [train] Step 70850 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.339 | Weights_l2 --> 9068.965 | Lr --> 0.000 | Seconds_per_step --> 4.886 | [2024-08-12 13:37:50,507][Main][INFO] - [train] Step 70900 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.338 | Weights_l2 --> 9068.942 | Lr --> 0.000 | Seconds_per_step --> 4.901 | [2024-08-12 13:41:51,338][Main][INFO] - [train] Step 70950 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.918 | Lr --> 0.000 | Seconds_per_step --> 4.817 | [2024-08-12 13:46:05,634][Main][INFO] - [train] Step 71000 out of 80000 | Loss --> 1.721 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.894 | Lr --> 0.000 | Seconds_per_step --> 5.086 | [2024-08-12 13:50:09,098][Main][INFO] - [train] Step 71050 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.871 | Lr --> 0.000 | Seconds_per_step --> 4.869 | [2024-08-12 13:54:09,531][Main][INFO] - [train] Step 71100 out of 80000 | Loss --> 1.728 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.849 | Lr --> 0.000 | Seconds_per_step --> 4.809 | [2024-08-12 13:58:21,788][Main][INFO] - [train] Step 71150 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.829 | Lr --> 0.000 | Seconds_per_step --> 5.045 | [2024-08-12 14:02:21,447][Main][INFO] - [train] Step 71200 out of 80000 | Loss --> 1.725 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.809 | Lr --> 0.000 | Seconds_per_step --> 4.793 | [2024-08-12 14:06:20,014][Main][INFO] - [train] Step 71250 out of 80000 | Loss --> 1.725 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.789 | Lr --> 0.000 | Seconds_per_step --> 4.771 | [2024-08-12 14:10:32,195][Main][INFO] - [train] Step 71300 out of 80000 | Loss --> 1.713 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.770 | Lr --> 0.000 | Seconds_per_step --> 5.044 | [2024-08-12 14:14:53,155][Main][INFO] - [train] Step 71350 out of 80000 | Loss --> 1.712 | Grad_l2 --> 0.344 | Weights_l2 --> 9068.751 | Lr --> 0.000 | Seconds_per_step --> 5.219 | [2024-08-12 14:18:49,772][Main][INFO] - [train] Step 71400 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.734 | Lr --> 0.000 | Seconds_per_step --> 4.732 | [2024-08-12 14:22:50,205][Main][INFO] - [train] Step 71450 out of 80000 | Loss --> 1.708 | Grad_l2 --> 0.340 | Weights_l2 --> 9068.718 | Lr --> 0.000 | Seconds_per_step --> 4.809 | [2024-08-12 14:26:37,544][Main][INFO] - [train] Step 71500 out of 80000 | Loss --> 1.713 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.702 | Lr --> 0.000 | Seconds_per_step --> 4.547 | [2024-08-12 14:30:43,794][Main][INFO] - [train] Step 71550 out of 80000 | Loss --> 1.703 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.686 | Lr --> 0.000 | Seconds_per_step --> 4.925 | [2024-08-12 14:34:43,687][Main][INFO] - [train] Step 71600 out of 80000 | Loss --> 1.705 | Grad_l2 --> 0.340 | Weights_l2 --> 9068.668 | Lr --> 0.000 | Seconds_per_step --> 4.798 | [2024-08-12 14:38:41,113][Main][INFO] - [train] Step 71650 out of 80000 | Loss --> 1.705 | Grad_l2 --> 0.340 | Weights_l2 --> 9068.651 | Lr --> 0.000 | Seconds_per_step --> 4.748 | [2024-08-12 14:42:57,491][Main][INFO] - [train] Step 71700 out of 80000 | Loss --> 1.705 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.634 | Lr --> 0.000 | Seconds_per_step --> 5.128 | [2024-08-12 14:46:57,538][Main][INFO] - [train] Step 71750 out of 80000 | Loss --> 1.710 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.619 | Lr --> 0.000 | Seconds_per_step --> 4.801 | [2024-08-12 14:50:54,553][Main][INFO] - [train] Step 71800 out of 80000 | Loss --> 1.705 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.603 | Lr --> 0.000 | Seconds_per_step --> 4.740 | [2024-08-12 14:55:04,238][Main][INFO] - [train] Step 71850 out of 80000 | Loss --> 1.703 | Grad_l2 --> 0.342 | Weights_l2 --> 9068.590 | Lr --> 0.000 | Seconds_per_step --> 4.994 | [2024-08-12 14:59:14,111][Main][INFO] - [train] Step 71900 out of 80000 | Loss --> 1.702 | Grad_l2 --> 0.341 | Weights_l2 --> 9068.573 | Lr --> 0.000 | Seconds_per_step --> 4.997 | [2024-08-12 15:00:26,188][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: c139f443-e606-47ad-b955-2e73792b3841)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/c074f3d3783ef8c321b40fd89088e5955cd05bad/fineweb-edu-dedup/train-00103-of-00234.parquet [2024-08-12 15:00:26,189][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2024-08-12 15:00:37,239][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 6d4c6bb1-c809-4736-966e-a86e5016b21c)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/c074f3d3783ef8c321b40fd89088e5955cd05bad/fineweb-edu-dedup/train-00103-of-00234.parquet [2024-08-12 15:00:37,240][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2024-08-12 15:02:31,253][Main][INFO] - [train] Step 71950 out of 80000 | Loss --> 1.703 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.559 | Lr --> 0.000 | Seconds_per_step --> 3.943 | [2024-08-12 15:05:20,828][Main][INFO] - [train] Step 72000 out of 80000 | Loss --> 1.712 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.543 | Lr --> 0.000 | Seconds_per_step --> 3.391 | [2024-08-12 15:08:10,019][Main][INFO] - [train] Step 72050 out of 80000 | Loss --> 1.707 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.528 | Lr --> 0.000 | Seconds_per_step --> 3.384 | [2024-08-12 15:10:59,105][Main][INFO] - [train] Step 72100 out of 80000 | Loss --> 1.702 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.514 | Lr --> 0.000 | Seconds_per_step --> 3.382 | [2024-08-12 15:13:49,571][Main][INFO] - [train] Step 72150 out of 80000 | Loss --> 1.711 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.498 | Lr --> 0.000 | Seconds_per_step --> 3.409 | [2024-08-12 15:16:42,032][Main][INFO] - [train] Step 72200 out of 80000 | Loss --> 1.701 | Grad_l2 --> 0.344 | Weights_l2 --> 9068.483 | Lr --> 0.000 | Seconds_per_step --> 3.449 | [2024-08-12 15:19:30,656][Main][INFO] - [train] Step 72250 out of 80000 | Loss --> 1.716 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.470 | Lr --> 0.000 | Seconds_per_step --> 3.372 | [2024-08-12 15:22:20,433][Main][INFO] - [train] Step 72300 out of 80000 | Loss --> 1.712 | Grad_l2 --> 0.344 | Weights_l2 --> 9068.455 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 15:25:11,089][Main][INFO] - [train] Step 72350 out of 80000 | Loss --> 1.716 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.440 | Lr --> 0.000 | Seconds_per_step --> 3.413 | [2024-08-12 15:28:01,003][Main][INFO] - [train] Step 72400 out of 80000 | Loss --> 1.717 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.426 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 15:30:50,569][Main][INFO] - [train] Step 72450 out of 80000 | Loss --> 1.713 | Grad_l2 --> 0.343 | Weights_l2 --> 9068.415 | Lr --> 0.000 | Seconds_per_step --> 3.391 | [2024-08-12 15:33:39,952][Main][INFO] - [train] Step 72500 out of 80000 | Loss --> 1.718 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.401 | Lr --> 0.000 | Seconds_per_step --> 3.388 | [2024-08-12 15:36:30,135][Main][INFO] - [train] Step 72550 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.388 | Lr --> 0.000 | Seconds_per_step --> 3.404 | [2024-08-12 15:39:19,623][Main][INFO] - [train] Step 72600 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.345 | Weights_l2 --> 9068.373 | Lr --> 0.000 | Seconds_per_step --> 3.390 | [2024-08-12 15:42:09,023][Main][INFO] - [train] Step 72650 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.344 | Weights_l2 --> 9068.360 | Lr --> 0.000 | Seconds_per_step --> 3.388 | [2024-08-12 15:44:58,509][Main][INFO] - [train] Step 72700 out of 80000 | Loss --> 1.734 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.347 | Lr --> 0.000 | Seconds_per_step --> 3.390 | [2024-08-12 15:47:49,181][Main][INFO] - [train] Step 72750 out of 80000 | Loss --> 1.721 | Grad_l2 --> 0.344 | Weights_l2 --> 9068.333 | Lr --> 0.000 | Seconds_per_step --> 3.413 | [2024-08-12 15:50:38,995][Main][INFO] - [train] Step 72800 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.322 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 15:53:28,892][Main][INFO] - [train] Step 72850 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.346 | Weights_l2 --> 9068.310 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 15:56:17,941][Main][INFO] - [train] Step 72900 out of 80000 | Loss --> 1.737 | Grad_l2 --> 0.346 | Weights_l2 --> 9068.298 | Lr --> 0.000 | Seconds_per_step --> 3.381 | [2024-08-12 15:59:12,501][Main][INFO] - [train] Step 72950 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.285 | Lr --> 0.000 | Seconds_per_step --> 3.491 | [2024-08-12 16:02:37,614][Main][INFO] - [train] Step 73000 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.272 | Lr --> 0.000 | Seconds_per_step --> 4.102 | [2024-08-12 16:06:37,371][Main][INFO] - [train] Step 73050 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.346 | Weights_l2 --> 9068.260 | Lr --> 0.000 | Seconds_per_step --> 4.795 | [2024-08-12 16:10:50,188][Main][INFO] - [train] Step 73100 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.247 | Lr --> 0.000 | Seconds_per_step --> 5.056 | [2024-08-12 16:14:47,653][Main][INFO] - [train] Step 73150 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.236 | Lr --> 0.000 | Seconds_per_step --> 4.749 | [2024-08-12 16:18:47,203][Main][INFO] - [train] Step 73200 out of 80000 | Loss --> 1.763 | Grad_l2 --> 0.350 | Weights_l2 --> 9068.224 | Lr --> 0.000 | Seconds_per_step --> 4.791 | [2024-08-12 16:22:52,223][Main][INFO] - [train] Step 73250 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.212 | Lr --> 0.000 | Seconds_per_step --> 4.900 | [2024-08-12 16:26:58,574][Main][INFO] - [train] Step 73300 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.200 | Lr --> 0.000 | Seconds_per_step --> 4.927 | [2024-08-12 16:30:52,333][Main][INFO] - [train] Step 73350 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.191 | Lr --> 0.000 | Seconds_per_step --> 4.675 | [2024-08-12 16:35:02,793][Main][INFO] - [train] Step 73400 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.351 | Weights_l2 --> 9068.179 | Lr --> 0.000 | Seconds_per_step --> 5.009 | [2024-08-12 16:39:12,998][Main][INFO] - [train] Step 73450 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.169 | Lr --> 0.000 | Seconds_per_step --> 5.004 | [2024-08-12 16:43:03,001][Main][INFO] - [train] Step 73500 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.158 | Lr --> 0.000 | Seconds_per_step --> 4.600 | [2024-08-12 16:47:03,618][Main][INFO] - [train] Step 73550 out of 80000 | Loss --> 1.759 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.147 | Lr --> 0.000 | Seconds_per_step --> 4.812 | [2024-08-12 16:51:16,923][Main][INFO] - [train] Step 73600 out of 80000 | Loss --> 1.758 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.136 | Lr --> 0.000 | Seconds_per_step --> 5.066 | [2024-08-12 16:55:17,319][Main][INFO] - [train] Step 73650 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.127 | Lr --> 0.000 | Seconds_per_step --> 4.808 | [2024-08-12 16:59:11,488][Main][INFO] - [train] Step 73700 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.116 | Lr --> 0.000 | Seconds_per_step --> 4.683 | [2024-08-12 17:03:14,319][Main][INFO] - [train] Step 73750 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.107 | Lr --> 0.000 | Seconds_per_step --> 4.857 | [2024-08-12 17:07:27,659][Main][INFO] - [train] Step 73800 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.097 | Lr --> 0.000 | Seconds_per_step --> 5.067 | [2024-08-12 17:11:27,086][Main][INFO] - [train] Step 73850 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.348 | Weights_l2 --> 9068.087 | Lr --> 0.000 | Seconds_per_step --> 4.789 | [2024-08-12 17:15:20,740][Main][INFO] - [train] Step 73900 out of 80000 | Loss --> 1.758 | Grad_l2 --> 0.350 | Weights_l2 --> 9068.078 | Lr --> 0.000 | Seconds_per_step --> 4.673 | [2024-08-12 17:19:26,343][Main][INFO] - [train] Step 73950 out of 80000 | Loss --> 1.753 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.068 | Lr --> 0.000 | Seconds_per_step --> 4.912 | [2024-08-12 17:23:30,087][Main][INFO] - [train] Step 74000 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.059 | Lr --> 0.000 | Seconds_per_step --> 4.875 | [2024-08-12 17:27:25,018][Main][INFO] - [train] Step 74050 out of 80000 | Loss --> 1.756 | Grad_l2 --> 0.350 | Weights_l2 --> 9068.050 | Lr --> 0.000 | Seconds_per_step --> 4.699 | [2024-08-12 17:31:28,610][Main][INFO] - [train] Step 74100 out of 80000 | Loss --> 1.758 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.040 | Lr --> 0.000 | Seconds_per_step --> 4.872 | [2024-08-12 17:35:36,982][Main][INFO] - [train] Step 74150 out of 80000 | Loss --> 1.752 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.032 | Lr --> 0.000 | Seconds_per_step --> 4.967 | [2024-08-12 17:38:39,898][Main][INFO] - [train] Step 74200 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.347 | Weights_l2 --> 9068.024 | Lr --> 0.000 | Seconds_per_step --> 3.658 | [2024-08-12 17:41:29,774][Main][INFO] - [train] Step 74250 out of 80000 | Loss --> 1.753 | Grad_l2 --> 0.350 | Weights_l2 --> 9068.015 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 17:44:19,599][Main][INFO] - [train] Step 74300 out of 80000 | Loss --> 1.755 | Grad_l2 --> 0.349 | Weights_l2 --> 9068.006 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 17:47:09,578][Main][INFO] - [train] Step 74350 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.997 | Lr --> 0.000 | Seconds_per_step --> 3.400 | [2024-08-12 17:50:00,007][Main][INFO] - [train] Step 74400 out of 80000 | Loss --> 1.757 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.989 | Lr --> 0.000 | Seconds_per_step --> 3.409 | [2024-08-12 17:52:48,380][Main][INFO] - [train] Step 74450 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.980 | Lr --> 0.000 | Seconds_per_step --> 3.367 | [2024-08-12 17:55:37,403][Main][INFO] - [train] Step 74500 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.347 | Weights_l2 --> 9067.972 | Lr --> 0.000 | Seconds_per_step --> 3.380 | [2024-08-12 17:58:27,279][Main][INFO] - [train] Step 74550 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.965 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 18:01:17,245][Main][INFO] - [train] Step 74600 out of 80000 | Loss --> 1.749 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.957 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 18:04:05,815][Main][INFO] - [train] Step 74650 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.950 | Lr --> 0.000 | Seconds_per_step --> 3.371 | [2024-08-12 18:06:55,310][Main][INFO] - [train] Step 74700 out of 80000 | Loss --> 1.743 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.942 | Lr --> 0.000 | Seconds_per_step --> 3.390 | [2024-08-12 18:09:44,130][Main][INFO] - [train] Step 74750 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.934 | Lr --> 0.000 | Seconds_per_step --> 3.376 | [2024-08-12 18:12:34,080][Main][INFO] - [train] Step 74800 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.926 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 18:15:24,419][Main][INFO] - [train] Step 74850 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.919 | Lr --> 0.000 | Seconds_per_step --> 3.407 | [2024-08-12 18:18:12,739][Main][INFO] - [train] Step 74900 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.912 | Lr --> 0.000 | Seconds_per_step --> 3.366 | [2024-08-12 18:21:02,709][Main][INFO] - [train] Step 74950 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.905 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 18:23:52,761][Main][INFO] - [train] Step 75000 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.898 | Lr --> 0.000 | Seconds_per_step --> 3.401 | [2024-08-12 18:23:52,762][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-75000 [2024-08-12 18:23:52,765][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-12 18:23:55,432][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-75000/model.safetensors [2024-08-12 18:23:58,451][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-75000/optimizer.bin [2024-08-12 18:23:58,451][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-75000/scheduler.bin [2024-08-12 18:23:58,452][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-75000/sampler.bin [2024-08-12 18:23:58,452][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-75000/sampler_1.bin [2024-08-12 18:23:58,452][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-75000/random_states_0.pkl [2024-08-12 18:26:49,045][Main][INFO] - [train] Step 75050 out of 80000 | Loss --> 1.754 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.891 | Lr --> 0.000 | Seconds_per_step --> 3.526 | [2024-08-12 18:29:38,847][Main][INFO] - [train] Step 75100 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.885 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 18:32:29,593][Main][INFO] - [train] Step 75150 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.879 | Lr --> 0.000 | Seconds_per_step --> 3.415 | [2024-08-12 18:35:19,147][Main][INFO] - [train] Step 75200 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.873 | Lr --> 0.000 | Seconds_per_step --> 3.391 | [2024-08-12 18:38:08,943][Main][INFO] - [train] Step 75250 out of 80000 | Loss --> 1.737 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.866 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 18:40:57,335][Main][INFO] - [train] Step 75300 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.860 | Lr --> 0.000 | Seconds_per_step --> 3.368 | [2024-08-12 18:43:47,536][Main][INFO] - [train] Step 75350 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.854 | Lr --> 0.000 | Seconds_per_step --> 3.404 | [2024-08-12 18:46:36,557][Main][INFO] - [train] Step 75400 out of 80000 | Loss --> 1.744 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.849 | Lr --> 0.000 | Seconds_per_step --> 3.380 | [2024-08-12 18:49:26,702][Main][INFO] - [train] Step 75450 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.843 | Lr --> 0.000 | Seconds_per_step --> 3.403 | [2024-08-12 18:52:16,129][Main][INFO] - [train] Step 75500 out of 80000 | Loss --> 1.747 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.838 | Lr --> 0.000 | Seconds_per_step --> 3.389 | [2024-08-12 18:55:06,083][Main][INFO] - [train] Step 75550 out of 80000 | Loss --> 1.750 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.831 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 18:57:56,288][Main][INFO] - [train] Step 75600 out of 80000 | Loss --> 1.751 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.826 | Lr --> 0.000 | Seconds_per_step --> 3.404 | [2024-08-12 19:00:47,552][Main][INFO] - [train] Step 75650 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.821 | Lr --> 0.000 | Seconds_per_step --> 3.425 | [2024-08-12 19:03:37,554][Main][INFO] - [train] Step 75700 out of 80000 | Loss --> 1.748 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.816 | Lr --> 0.000 | Seconds_per_step --> 3.400 | [2024-08-12 19:06:26,390][Main][INFO] - [train] Step 75750 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.810 | Lr --> 0.000 | Seconds_per_step --> 3.377 | [2024-08-12 19:09:15,513][Main][INFO] - [train] Step 75800 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.806 | Lr --> 0.000 | Seconds_per_step --> 3.382 | [2024-08-12 19:12:04,817][Main][INFO] - [train] Step 75850 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.800 | Lr --> 0.000 | Seconds_per_step --> 3.386 | [2024-08-12 19:14:54,908][Main][INFO] - [train] Step 75900 out of 80000 | Loss --> 1.736 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.795 | Lr --> 0.000 | Seconds_per_step --> 3.402 | [2024-08-12 19:17:44,997][Main][INFO] - [train] Step 75950 out of 80000 | Loss --> 1.738 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.791 | Lr --> 0.000 | Seconds_per_step --> 3.402 | [2024-08-12 19:20:35,427][Main][INFO] - [train] Step 76000 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.786 | Lr --> 0.000 | Seconds_per_step --> 3.409 | [2024-08-12 19:23:26,015][Main][INFO] - [train] Step 76050 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.781 | Lr --> 0.000 | Seconds_per_step --> 3.412 | [2024-08-12 19:26:15,433][Main][INFO] - [train] Step 76100 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.776 | Lr --> 0.000 | Seconds_per_step --> 3.388 | [2024-08-12 19:29:03,758][Main][INFO] - [train] Step 76150 out of 80000 | Loss --> 1.734 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.771 | Lr --> 0.000 | Seconds_per_step --> 3.366 | [2024-08-12 19:31:52,080][Main][INFO] - [train] Step 76200 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.767 | Lr --> 0.000 | Seconds_per_step --> 3.366 | [2024-08-12 19:34:45,455][Main][INFO] - [train] Step 76250 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.763 | Lr --> 0.000 | Seconds_per_step --> 3.467 | [2024-08-12 19:37:34,591][Main][INFO] - [train] Step 76300 out of 80000 | Loss --> 1.723 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.759 | Lr --> 0.000 | Seconds_per_step --> 3.383 | [2024-08-12 19:40:23,592][Main][INFO] - [train] Step 76350 out of 80000 | Loss --> 1.734 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.755 | Lr --> 0.000 | Seconds_per_step --> 3.380 | [2024-08-12 19:43:12,814][Main][INFO] - [train] Step 76400 out of 80000 | Loss --> 1.732 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.751 | Lr --> 0.000 | Seconds_per_step --> 3.384 | [2024-08-12 19:46:01,456][Main][INFO] - [train] Step 76450 out of 80000 | Loss --> 1.723 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.747 | Lr --> 0.000 | Seconds_per_step --> 3.373 | [2024-08-12 19:48:51,400][Main][INFO] - [train] Step 76500 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.743 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 19:51:40,649][Main][INFO] - [train] Step 76550 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.740 | Lr --> 0.000 | Seconds_per_step --> 3.385 | [2024-08-12 19:54:29,691][Main][INFO] - [train] Step 76600 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.736 | Lr --> 0.000 | Seconds_per_step --> 3.381 | [2024-08-12 19:57:19,230][Main][INFO] - [train] Step 76650 out of 80000 | Loss --> 1.717 | Grad_l2 --> 0.347 | Weights_l2 --> 9067.733 | Lr --> 0.000 | Seconds_per_step --> 3.391 | [2024-08-12 20:00:09,385][Main][INFO] - [train] Step 76700 out of 80000 | Loss --> 1.717 | Grad_l2 --> 0.347 | Weights_l2 --> 9067.729 | Lr --> 0.000 | Seconds_per_step --> 3.403 | [2024-08-12 20:02:57,964][Main][INFO] - [train] Step 76750 out of 80000 | Loss --> 1.715 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.726 | Lr --> 0.000 | Seconds_per_step --> 3.372 | [2024-08-12 20:05:47,101][Main][INFO] - [train] Step 76800 out of 80000 | Loss --> 1.710 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.722 | Lr --> 0.000 | Seconds_per_step --> 3.383 | [2024-08-12 20:08:37,921][Main][INFO] - [train] Step 76850 out of 80000 | Loss --> 1.725 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.719 | Lr --> 0.000 | Seconds_per_step --> 3.416 | [2024-08-12 20:11:28,368][Main][INFO] - [train] Step 76900 out of 80000 | Loss --> 1.716 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.716 | Lr --> 0.000 | Seconds_per_step --> 3.409 | [2024-08-12 20:14:18,240][Main][INFO] - [train] Step 76950 out of 80000 | Loss --> 1.722 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.713 | Lr --> 0.000 | Seconds_per_step --> 3.397 | [2024-08-12 20:17:07,126][Main][INFO] - [train] Step 77000 out of 80000 | Loss --> 1.727 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.710 | Lr --> 0.000 | Seconds_per_step --> 3.378 | [2024-08-12 20:19:57,021][Main][INFO] - [train] Step 77050 out of 80000 | Loss --> 1.727 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.707 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 20:22:47,079][Main][INFO] - [train] Step 77100 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.705 | Lr --> 0.000 | Seconds_per_step --> 3.401 | [2024-08-12 20:25:36,117][Main][INFO] - [train] Step 77150 out of 80000 | Loss --> 1.720 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.702 | Lr --> 0.000 | Seconds_per_step --> 3.381 | [2024-08-12 20:28:24,552][Main][INFO] - [train] Step 77200 out of 80000 | Loss --> 1.716 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.699 | Lr --> 0.000 | Seconds_per_step --> 3.369 | [2024-08-12 20:31:13,686][Main][INFO] - [train] Step 77250 out of 80000 | Loss --> 1.723 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.696 | Lr --> 0.000 | Seconds_per_step --> 3.383 | [2024-08-12 20:34:04,979][Main][INFO] - [train] Step 77300 out of 80000 | Loss --> 1.724 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.694 | Lr --> 0.000 | Seconds_per_step --> 3.426 | [2024-08-12 20:36:54,247][Main][INFO] - [train] Step 77350 out of 80000 | Loss --> 1.724 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.691 | Lr --> 0.000 | Seconds_per_step --> 3.385 | [2024-08-12 20:39:44,072][Main][INFO] - [train] Step 77400 out of 80000 | Loss --> 1.717 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.689 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 20:42:33,256][Main][INFO] - [train] Step 77450 out of 80000 | Loss --> 1.721 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.686 | Lr --> 0.000 | Seconds_per_step --> 3.384 | [2024-08-12 20:45:23,400][Main][INFO] - [train] Step 77500 out of 80000 | Loss --> 1.725 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.684 | Lr --> 0.000 | Seconds_per_step --> 3.403 | [2024-08-12 20:48:13,007][Main][INFO] - [train] Step 77550 out of 80000 | Loss --> 1.723 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.682 | Lr --> 0.000 | Seconds_per_step --> 3.392 | [2024-08-12 20:51:01,893][Main][INFO] - [train] Step 77600 out of 80000 | Loss --> 1.711 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.680 | Lr --> 0.000 | Seconds_per_step --> 3.378 | [2024-08-12 20:53:51,688][Main][INFO] - [train] Step 77650 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.678 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 20:56:42,523][Main][INFO] - [train] Step 77700 out of 80000 | Loss --> 1.718 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.676 | Lr --> 0.000 | Seconds_per_step --> 3.417 | [2024-08-12 20:59:36,305][Main][INFO] - [train] Step 77750 out of 80000 | Loss --> 1.717 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.674 | Lr --> 0.000 | Seconds_per_step --> 3.476 | [2024-08-12 21:02:26,051][Main][INFO] - [train] Step 77800 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.672 | Lr --> 0.000 | Seconds_per_step --> 3.395 | [2024-08-12 21:05:15,893][Main][INFO] - [train] Step 77850 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.670 | Lr --> 0.000 | Seconds_per_step --> 3.397 | [2024-08-12 21:08:06,462][Main][INFO] - [train] Step 77900 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.668 | Lr --> 0.000 | Seconds_per_step --> 3.411 | [2024-08-12 21:10:56,190][Main][INFO] - [train] Step 77950 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.666 | Lr --> 0.000 | Seconds_per_step --> 3.395 | [2024-08-12 21:13:44,945][Main][INFO] - [train] Step 78000 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.665 | Lr --> 0.000 | Seconds_per_step --> 3.375 | [2024-08-12 21:16:34,160][Main][INFO] - [train] Step 78050 out of 80000 | Loss --> 1.710 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.663 | Lr --> 0.000 | Seconds_per_step --> 3.384 | [2024-08-12 21:19:24,074][Main][INFO] - [train] Step 78100 out of 80000 | Loss --> 1.707 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.661 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 21:22:14,845][Main][INFO] - [train] Step 78150 out of 80000 | Loss --> 1.701 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.660 | Lr --> 0.000 | Seconds_per_step --> 3.415 | [2024-08-12 21:25:04,253][Main][INFO] - [train] Step 78200 out of 80000 | Loss --> 1.707 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.658 | Lr --> 0.000 | Seconds_per_step --> 3.388 | [2024-08-12 21:27:53,707][Main][INFO] - [train] Step 78250 out of 80000 | Loss --> 1.707 | Grad_l2 --> 0.348 | Weights_l2 --> 9067.657 | Lr --> 0.000 | Seconds_per_step --> 3.389 | [2024-08-12 21:30:43,019][Main][INFO] - [train] Step 78300 out of 80000 | Loss --> 1.710 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.656 | Lr --> 0.000 | Seconds_per_step --> 3.386 | [2024-08-12 21:33:33,014][Main][INFO] - [train] Step 78350 out of 80000 | Loss --> 1.710 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.655 | Lr --> 0.000 | Seconds_per_step --> 3.400 | [2024-08-12 21:36:23,186][Main][INFO] - [train] Step 78400 out of 80000 | Loss --> 1.707 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.653 | Lr --> 0.000 | Seconds_per_step --> 3.403 | [2024-08-12 21:39:12,309][Main][INFO] - [train] Step 78450 out of 80000 | Loss --> 1.720 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.652 | Lr --> 0.000 | Seconds_per_step --> 3.382 | [2024-08-12 21:42:03,162][Main][INFO] - [train] Step 78500 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.651 | Lr --> 0.000 | Seconds_per_step --> 3.417 | [2024-08-12 21:44:53,192][Main][INFO] - [train] Step 78550 out of 80000 | Loss --> 1.724 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.649 | Lr --> 0.000 | Seconds_per_step --> 3.401 | [2024-08-12 21:47:43,149][Main][INFO] - [train] Step 78600 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.648 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 21:50:32,387][Main][INFO] - [train] Step 78650 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.647 | Lr --> 0.000 | Seconds_per_step --> 3.385 | [2024-08-12 21:53:22,026][Main][INFO] - [train] Step 78700 out of 80000 | Loss --> 1.714 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.646 | Lr --> 0.000 | Seconds_per_step --> 3.393 | [2024-08-12 21:56:11,907][Main][INFO] - [train] Step 78750 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.645 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 21:59:00,947][Main][INFO] - [train] Step 78800 out of 80000 | Loss --> 1.719 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.644 | Lr --> 0.000 | Seconds_per_step --> 3.381 | [2024-08-12 22:01:50,443][Main][INFO] - [train] Step 78850 out of 80000 | Loss --> 1.722 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.643 | Lr --> 0.000 | Seconds_per_step --> 3.390 | [2024-08-12 22:04:39,064][Main][INFO] - [train] Step 78900 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.642 | Lr --> 0.000 | Seconds_per_step --> 3.372 | [2024-08-12 22:07:29,275][Main][INFO] - [train] Step 78950 out of 80000 | Loss --> 1.726 | Grad_l2 --> 0.349 | Weights_l2 --> 9067.641 | Lr --> 0.000 | Seconds_per_step --> 3.404 | [2024-08-12 22:10:18,964][Main][INFO] - [train] Step 79000 out of 80000 | Loss --> 1.729 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.640 | Lr --> 0.000 | Seconds_per_step --> 3.394 | [2024-08-12 22:13:08,518][Main][INFO] - [train] Step 79050 out of 80000 | Loss --> 1.724 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.639 | Lr --> 0.000 | Seconds_per_step --> 3.391 | [2024-08-12 22:15:58,096][Main][INFO] - [train] Step 79100 out of 80000 | Loss --> 1.731 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.638 | Lr --> 0.000 | Seconds_per_step --> 3.392 | [2024-08-12 22:18:47,746][Main][INFO] - [train] Step 79150 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.637 | Lr --> 0.000 | Seconds_per_step --> 3.393 | [2024-08-12 22:21:37,645][Main][INFO] - [train] Step 79200 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.637 | Lr --> 0.000 | Seconds_per_step --> 3.398 | [2024-08-12 22:24:27,038][Main][INFO] - [train] Step 79250 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.636 | Lr --> 0.000 | Seconds_per_step --> 3.388 | [2024-08-12 22:27:16,116][Main][INFO] - [train] Step 79300 out of 80000 | Loss --> 1.735 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.635 | Lr --> 0.000 | Seconds_per_step --> 3.382 | [2024-08-12 22:30:05,701][Main][INFO] - [train] Step 79350 out of 80000 | Loss --> 1.733 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.635 | Lr --> 0.000 | Seconds_per_step --> 3.392 | [2024-08-12 22:32:54,419][Main][INFO] - [train] Step 79400 out of 80000 | Loss --> 1.730 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.634 | Lr --> 0.000 | Seconds_per_step --> 3.374 | [2024-08-12 22:35:43,068][Main][INFO] - [train] Step 79450 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.633 | Lr --> 0.000 | Seconds_per_step --> 3.373 | [2024-08-12 22:38:32,666][Main][INFO] - [train] Step 79500 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.353 | Weights_l2 --> 9067.633 | Lr --> 0.000 | Seconds_per_step --> 3.392 | [2024-08-12 22:41:21,888][Main][INFO] - [train] Step 79550 out of 80000 | Loss --> 1.723 | Grad_l2 --> 0.350 | Weights_l2 --> 9067.632 | Lr --> 0.000 | Seconds_per_step --> 3.384 | [2024-08-12 22:44:11,844][Main][INFO] - [train] Step 79600 out of 80000 | Loss --> 1.742 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.631 | Lr --> 0.000 | Seconds_per_step --> 3.399 | [2024-08-12 22:47:00,902][Main][INFO] - [train] Step 79650 out of 80000 | Loss --> 1.741 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.630 | Lr --> 0.000 | Seconds_per_step --> 3.381 | [2024-08-12 22:49:50,032][Main][INFO] - [train] Step 79700 out of 80000 | Loss --> 1.739 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.629 | Lr --> 0.000 | Seconds_per_step --> 3.383 | [2024-08-12 22:52:39,520][Main][INFO] - [train] Step 79750 out of 80000 | Loss --> 1.743 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.629 | Lr --> 0.000 | Seconds_per_step --> 3.390 | [2024-08-12 22:55:29,331][Main][INFO] - [train] Step 79800 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.629 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 22:58:19,091][Main][INFO] - [train] Step 79850 out of 80000 | Loss --> 1.746 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.628 | Lr --> 0.000 | Seconds_per_step --> 3.395 | [2024-08-12 23:01:08,868][Main][INFO] - [train] Step 79900 out of 80000 | Loss --> 1.737 | Grad_l2 --> 0.353 | Weights_l2 --> 9067.627 | Lr --> 0.000 | Seconds_per_step --> 3.396 | [2024-08-12 23:03:57,751][Main][INFO] - [train] Step 79950 out of 80000 | Loss --> 1.745 | Grad_l2 --> 0.352 | Weights_l2 --> 9067.627 | Lr --> 0.000 | Seconds_per_step --> 3.378 | [2024-08-12 23:06:47,601][Main][INFO] - [train] Step 80000 out of 80000 | Loss --> 1.740 | Grad_l2 --> 0.351 | Weights_l2 --> 9067.626 | Lr --> 0.000 | Seconds_per_step --> 3.397 | [2024-08-12 23:06:47,602][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-80000 [2024-08-12 23:06:47,605][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-12 23:06:50,280][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-80000/model.safetensors [2024-08-12 23:06:53,299][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-80000/optimizer.bin [2024-08-12 23:06:53,299][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-80000/scheduler.bin [2024-08-12 23:06:53,299][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-80000/sampler.bin [2024-08-12 23:06:53,299][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-80000/sampler_1.bin [2024-08-12 23:06:53,300][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-80000/random_states_0.pkl [2024-08-12 23:16:15,352][Main][INFO] - [eval] Step 80001 out of 80000 | Loss --> 2.053 | Accuracy --> 0.608 | Time --> 561.184 | [2024-08-12 23:16:15,356][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-80001 [2024-08-12 23:16:15,361][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-12 23:16:17,400][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-80001/model.safetensors [2024-08-12 23:16:20,285][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-80001/optimizer.bin [2024-08-12 23:16:20,285][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-80001/scheduler.bin [2024-08-12 23:16:20,285][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-80001/sampler.bin [2024-08-12 23:16:20,285][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-80001/sampler_1.bin [2024-08-12 23:16:20,286][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-80001/random_states_0.pkl