[2024-08-09 08:30:30,106][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-08-09 08:30:30,106][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-08-09/08-30-29- [2024-08-09 08:38:01,730][Main][INFO] - [train] Step 50 out of 80000 | Loss --> 60.113 | Grad_l2 --> 186.709 | Weights_l2 --> 8624.587 | Lr --> 0.004 | Seconds_per_step --> 8.363 | [2024-08-09 08:42:09,928][Main][INFO] - [train] Step 100 out of 80000 | Loss --> 22.120 | Grad_l2 --> 47.074 | Weights_l2 --> 8624.166 | Lr --> 0.004 | Seconds_per_step --> 4.964 | [2024-08-09 08:46:13,808][Main][INFO] - [train] Step 150 out of 80000 | Loss --> 12.856 | Grad_l2 --> 28.865 | Weights_l2 --> 8623.587 | Lr --> 0.004 | Seconds_per_step --> 4.878 | [2024-08-09 08:50:08,941][Main][INFO] - [train] Step 200 out of 80000 | Loss --> 10.357 | Grad_l2 --> 30.528 | Weights_l2 --> 8623.073 | Lr --> 0.004 | Seconds_per_step --> 4.703 | [2024-08-09 08:54:06,924][Main][INFO] - [train] Step 250 out of 80000 | Loss --> 8.792 | Grad_l2 --> 17.202 | Weights_l2 --> 8622.533 | Lr --> 0.004 | Seconds_per_step --> 4.760 | [2024-08-09 08:58:12,688][Main][INFO] - [train] Step 300 out of 80000 | Loss --> 7.720 | Grad_l2 --> 12.189 | Weights_l2 --> 8622.034 | Lr --> 0.004 | Seconds_per_step --> 4.915 | [2024-08-09 09:02:09,434][Main][INFO] - [train] Step 350 out of 80000 | Loss --> 7.276 | Grad_l2 --> 10.214 | Weights_l2 --> 8621.544 | Lr --> 0.004 | Seconds_per_step --> 4.735 | [2024-08-09 09:06:02,511][Main][INFO] - [train] Step 400 out of 80000 | Loss --> 7.054 | Grad_l2 --> 10.111 | Weights_l2 --> 8621.091 | Lr --> 0.004 | Seconds_per_step --> 4.662 | [2024-08-09 09:10:08,058][Main][INFO] - [train] Step 450 out of 80000 | Loss --> 6.941 | Grad_l2 --> 9.960 | Weights_l2 --> 8620.672 | Lr --> 0.004 | Seconds_per_step --> 4.911 | [2024-08-09 09:14:09,465][Main][INFO] - [train] Step 500 out of 80000 | Loss --> 6.777 | Grad_l2 --> 9.558 | Weights_l2 --> 8620.252 | Lr --> 0.004 | Seconds_per_step --> 4.828 | [2024-08-09 09:17:58,397][Main][INFO] - [train] Step 550 out of 80000 | Loss --> 6.730 | Grad_l2 --> 9.024 | Weights_l2 --> 8619.864 | Lr --> 0.004 | Seconds_per_step --> 4.579 | [2024-08-09 09:21:50,846][Main][INFO] - [train] Step 600 out of 80000 | Loss --> 6.626 | Grad_l2 --> 7.926 | Weights_l2 --> 8619.457 | Lr --> 0.004 | Seconds_per_step --> 4.649 | [2024-08-09 09:25:58,874][Main][INFO] - [train] Step 650 out of 80000 | Loss --> 6.504 | Grad_l2 --> 6.422 | Weights_l2 --> 8619.040 | Lr --> 0.004 | Seconds_per_step --> 4.961 | [2024-08-09 09:29:54,562][Main][INFO] - [train] Step 700 out of 80000 | Loss --> 6.425 | Grad_l2 --> 6.909 | Weights_l2 --> 8618.645 | Lr --> 0.004 | Seconds_per_step --> 4.714 | [2024-08-09 09:33:50,054][Main][INFO] - [train] Step 750 out of 80000 | Loss --> 6.413 | Grad_l2 --> 6.699 | Weights_l2 --> 8618.254 | Lr --> 0.004 | Seconds_per_step --> 4.710 | [2024-08-09 09:37:48,669][Main][INFO] - [train] Step 800 out of 80000 | Loss --> 6.339 | Grad_l2 --> 4.883 | Weights_l2 --> 8617.828 | Lr --> 0.004 | Seconds_per_step --> 4.772 | [2024-08-09 09:41:53,765][Main][INFO] - [train] Step 850 out of 80000 | Loss --> 6.305 | Grad_l2 --> 5.402 | Weights_l2 --> 8617.423 | Lr --> 0.004 | Seconds_per_step --> 4.902 | [2024-08-09 09:45:52,215][Main][INFO] - [train] Step 900 out of 80000 | Loss --> 6.254 | Grad_l2 --> 5.631 | Weights_l2 --> 8617.040 | Lr --> 0.004 | Seconds_per_step --> 4.769 | [2024-08-09 09:49:47,148][Main][INFO] - [train] Step 950 out of 80000 | Loss --> 6.232 | Grad_l2 --> 5.005 | Weights_l2 --> 8616.646 | Lr --> 0.004 | Seconds_per_step --> 4.699 | [2024-08-09 09:53:46,382][Main][INFO] - [train] Step 1000 out of 80000 | Loss --> 6.170 | Grad_l2 --> 5.456 | Weights_l2 --> 8616.274 | Lr --> 0.004 | Seconds_per_step --> 4.785 | [2024-08-09 09:57:42,782][Main][INFO] - [train] Step 1050 out of 80000 | Loss --> 6.163 | Grad_l2 --> 3.954 | Weights_l2 --> 8615.859 | Lr --> 0.004 | Seconds_per_step --> 4.728 | [2024-08-09 10:01:39,784][Main][INFO] - [train] Step 1100 out of 80000 | Loss --> 6.153 | Grad_l2 --> 4.661 | Weights_l2 --> 8615.485 | Lr --> 0.004 | Seconds_per_step --> 4.740 | [2024-08-09 10:05:37,074][Main][INFO] - [train] Step 1150 out of 80000 | Loss --> 6.120 | Grad_l2 --> 4.405 | Weights_l2 --> 8615.110 | Lr --> 0.004 | Seconds_per_step --> 4.746 | [2024-08-09 10:09:42,375][Main][INFO] - [train] Step 1200 out of 80000 | Loss --> 6.095 | Grad_l2 --> 4.862 | Weights_l2 --> 8614.756 | Lr --> 0.004 | Seconds_per_step --> 4.906 | [2024-08-09 10:13:44,826][Main][INFO] - [train] Step 1250 out of 80000 | Loss --> 6.065 | Grad_l2 --> 3.995 | Weights_l2 --> 8614.382 | Lr --> 0.004 | Seconds_per_step --> 4.849 | [2024-08-09 10:17:45,169][Main][INFO] - [train] Step 1300 out of 80000 | Loss --> 5.987 | Grad_l2 --> 4.501 | Weights_l2 --> 8614.025 | Lr --> 0.005 | Seconds_per_step --> 4.807 | [2024-08-09 10:21:46,890][Main][INFO] - [train] Step 1350 out of 80000 | Loss --> 6.011 | Grad_l2 --> 4.330 | Weights_l2 --> 8613.671 | Lr --> 0.005 | Seconds_per_step --> 4.834 | [2024-08-09 10:25:46,445][Main][INFO] - [train] Step 1400 out of 80000 | Loss --> 5.968 | Grad_l2 --> 4.033 | Weights_l2 --> 8613.308 | Lr --> 0.005 | Seconds_per_step --> 4.791 | [2024-08-09 10:29:35,135][Main][INFO] - [train] Step 1450 out of 80000 | Loss --> 5.965 | Grad_l2 --> 3.817 | Weights_l2 --> 8612.959 | Lr --> 0.005 | Seconds_per_step --> 4.574 | [2024-08-09 10:33:33,627][Main][INFO] - [train] Step 1500 out of 80000 | Loss --> 5.926 | Grad_l2 --> 3.525 | Weights_l2 --> 8612.605 | Lr --> 0.005 | Seconds_per_step --> 4.770 | [2024-08-09 10:37:31,600][Main][INFO] - [train] Step 1550 out of 80000 | Loss --> 5.908 | Grad_l2 --> 3.178 | Weights_l2 --> 8612.265 | Lr --> 0.005 | Seconds_per_step --> 4.759 | [2024-08-09 10:41:26,179][Main][INFO] - [train] Step 1600 out of 80000 | Loss --> 5.878 | Grad_l2 --> 3.430 | Weights_l2 --> 8611.930 | Lr --> 0.005 | Seconds_per_step --> 4.692 | [2024-08-09 10:45:17,990][Main][INFO] - [train] Step 1650 out of 80000 | Loss --> 5.864 | Grad_l2 --> 3.399 | Weights_l2 --> 8611.598 | Lr --> 0.005 | Seconds_per_step --> 4.636 | [2024-08-09 10:49:16,915][Main][INFO] - [train] Step 1700 out of 80000 | Loss --> 5.845 | Grad_l2 --> 3.266 | Weights_l2 --> 8611.279 | Lr --> 0.005 | Seconds_per_step --> 4.778 | [2024-08-09 10:53:22,739][Main][INFO] - [train] Step 1750 out of 80000 | Loss --> 5.815 | Grad_l2 --> 3.539 | Weights_l2 --> 8610.973 | Lr --> 0.005 | Seconds_per_step --> 4.916 | [2024-08-09 10:57:15,819][Main][INFO] - [train] Step 1800 out of 80000 | Loss --> 5.813 | Grad_l2 --> 3.014 | Weights_l2 --> 8610.660 | Lr --> 0.005 | Seconds_per_step --> 4.662 | [2024-08-09 11:01:07,812][Main][INFO] - [train] Step 1850 out of 80000 | Loss --> 5.781 | Grad_l2 --> 3.157 | Weights_l2 --> 8610.357 | Lr --> 0.005 | Seconds_per_step --> 4.640 | [2024-08-09 11:05:06,130][Main][INFO] - [train] Step 1900 out of 80000 | Loss --> 5.781 | Grad_l2 --> 2.876 | Weights_l2 --> 8610.069 | Lr --> 0.005 | Seconds_per_step --> 4.766 | [2024-08-09 11:09:10,053][Main][INFO] - [train] Step 1950 out of 80000 | Loss --> 5.727 | Grad_l2 --> 3.171 | Weights_l2 --> 8609.783 | Lr --> 0.005 | Seconds_per_step --> 4.878 | [2024-08-09 11:13:04,823][Main][INFO] - [train] Step 2000 out of 80000 | Loss --> 5.701 | Grad_l2 --> 3.384 | Weights_l2 --> 8609.494 | Lr --> 0.005 | Seconds_per_step --> 4.695 | [2024-08-09 11:16:58,015][Main][INFO] - [train] Step 2050 out of 80000 | Loss --> 5.706 | Grad_l2 --> 2.739 | Weights_l2 --> 8609.191 | Lr --> 0.005 | Seconds_per_step --> 4.664 | [2024-08-09 11:21:09,220][Main][INFO] - [train] Step 2100 out of 80000 | Loss --> 5.697 | Grad_l2 --> 2.753 | Weights_l2 --> 8608.924 | Lr --> 0.005 | Seconds_per_step --> 5.024 | [2024-08-09 11:24:59,988][Main][INFO] - [train] Step 2150 out of 80000 | Loss --> 5.679 | Grad_l2 --> 2.713 | Weights_l2 --> 8608.657 | Lr --> 0.005 | Seconds_per_step --> 4.615 | [2024-08-09 11:28:50,211][Main][INFO] - [train] Step 2200 out of 80000 | Loss --> 5.659 | Grad_l2 --> 2.789 | Weights_l2 --> 8608.401 | Lr --> 0.005 | Seconds_per_step --> 4.604 | [2024-08-09 11:32:47,428][Main][INFO] - [train] Step 2250 out of 80000 | Loss --> 5.643 | Grad_l2 --> 3.085 | Weights_l2 --> 8608.150 | Lr --> 0.005 | Seconds_per_step --> 4.744 | [2024-08-09 11:36:52,444][Main][INFO] - [train] Step 2300 out of 80000 | Loss --> 5.606 | Grad_l2 --> 3.170 | Weights_l2 --> 8607.880 | Lr --> 0.005 | Seconds_per_step --> 4.900 | [2024-08-09 11:40:40,829][Main][INFO] - [train] Step 2350 out of 80000 | Loss --> 5.585 | Grad_l2 --> 2.834 | Weights_l2 --> 8607.632 | Lr --> 0.005 | Seconds_per_step --> 4.568 | [2024-08-09 11:44:35,220][Main][INFO] - [train] Step 2400 out of 80000 | Loss --> 5.595 | Grad_l2 --> 2.603 | Weights_l2 --> 8607.391 | Lr --> 0.005 | Seconds_per_step --> 4.688 | [2024-08-09 11:47:52,825][Main][INFO] - [train] Step 2450 out of 80000 | Loss --> 5.571 | Grad_l2 --> 2.616 | Weights_l2 --> 8607.146 | Lr --> 0.005 | Seconds_per_step --> 3.952 | [2024-08-09 11:50:42,712][Main][INFO] - [train] Step 2500 out of 80000 | Loss --> 5.588 | Grad_l2 --> 2.392 | Weights_l2 --> 8606.913 | Lr --> 0.005 | Seconds_per_step --> 3.398 | [2024-08-09 11:54:19,840][Main][INFO] - [train] Step 2550 out of 80000 | Loss --> 5.598 | Grad_l2 --> 3.058 | Weights_l2 --> 8606.708 | Lr --> 0.005 | Seconds_per_step --> 4.343 | [2024-08-09 11:58:07,896][Main][INFO] - [train] Step 2600 out of 80000 | Loss --> 5.554 | Grad_l2 --> 2.508 | Weights_l2 --> 8606.498 | Lr --> 0.005 | Seconds_per_step --> 4.561 | [2024-08-09 12:02:07,989][Main][INFO] - [train] Step 2650 out of 80000 | Loss --> 5.536 | Grad_l2 --> 2.317 | Weights_l2 --> 8606.300 | Lr --> 0.005 | Seconds_per_step --> 4.802 | [2024-08-09 12:06:22,355][Main][INFO] - [train] Step 2700 out of 80000 | Loss --> 5.533 | Grad_l2 --> 2.347 | Weights_l2 --> 8606.121 | Lr --> 0.005 | Seconds_per_step --> 5.087 | [2024-08-09 12:10:05,296][Main][INFO] - [train] Step 2750 out of 80000 | Loss --> 5.502 | Grad_l2 --> 2.522 | Weights_l2 --> 8605.932 | Lr --> 0.005 | Seconds_per_step --> 4.459 | [2024-08-09 12:13:56,942][Main][INFO] - [train] Step 2800 out of 80000 | Loss --> 5.484 | Grad_l2 --> 2.503 | Weights_l2 --> 8605.729 | Lr --> 0.005 | Seconds_per_step --> 4.633 | [2024-08-09 12:17:56,310][Main][INFO] - [train] Step 2850 out of 80000 | Loss --> 5.471 | Grad_l2 --> 2.559 | Weights_l2 --> 8605.524 | Lr --> 0.005 | Seconds_per_step --> 4.787 | [2024-08-09 12:21:50,249][Main][INFO] - [train] Step 2900 out of 80000 | Loss --> 5.463 | Grad_l2 --> 2.446 | Weights_l2 --> 8605.344 | Lr --> 0.005 | Seconds_per_step --> 4.679 | [2024-08-09 12:25:43,300][Main][INFO] - [train] Step 2950 out of 80000 | Loss --> 5.481 | Grad_l2 --> 2.152 | Weights_l2 --> 8605.182 | Lr --> 0.005 | Seconds_per_step --> 4.661 | [2024-08-09 12:29:34,779][Main][INFO] - [train] Step 3000 out of 80000 | Loss --> 5.444 | Grad_l2 --> 2.267 | Weights_l2 --> 8605.025 | Lr --> 0.005 | Seconds_per_step --> 4.630 | [2024-08-09 12:33:43,889][Main][INFO] - [train] Step 3050 out of 80000 | Loss --> 5.445 | Grad_l2 --> 2.029 | Weights_l2 --> 8604.870 | Lr --> 0.005 | Seconds_per_step --> 4.982 | [2024-08-09 12:37:33,552][Main][INFO] - [train] Step 3100 out of 80000 | Loss --> 5.439 | Grad_l2 --> 2.249 | Weights_l2 --> 8604.734 | Lr --> 0.005 | Seconds_per_step --> 4.593 | [2024-08-09 12:41:33,458][Main][INFO] - [train] Step 3150 out of 80000 | Loss --> 5.390 | Grad_l2 --> 2.281 | Weights_l2 --> 8604.574 | Lr --> 0.005 | Seconds_per_step --> 4.798 | [2024-08-09 12:45:28,169][Main][INFO] - [train] Step 3200 out of 80000 | Loss --> 5.395 | Grad_l2 --> 2.124 | Weights_l2 --> 8604.424 | Lr --> 0.005 | Seconds_per_step --> 4.694 | [2024-08-09 12:49:31,716][Main][INFO] - [train] Step 3250 out of 80000 | Loss --> 5.381 | Grad_l2 --> 2.379 | Weights_l2 --> 8604.286 | Lr --> 0.005 | Seconds_per_step --> 4.871 | [2024-08-09 12:53:26,686][Main][INFO] - [train] Step 3300 out of 80000 | Loss --> 5.365 | Grad_l2 --> 2.335 | Weights_l2 --> 8604.130 | Lr --> 0.005 | Seconds_per_step --> 4.699 | [2024-08-09 12:57:18,564][Main][INFO] - [train] Step 3350 out of 80000 | Loss --> 5.365 | Grad_l2 --> 2.185 | Weights_l2 --> 8603.989 | Lr --> 0.005 | Seconds_per_step --> 4.638 | [2024-08-09 13:01:23,837][Main][INFO] - [train] Step 3400 out of 80000 | Loss --> 5.347 | Grad_l2 --> 2.330 | Weights_l2 --> 8603.845 | Lr --> 0.005 | Seconds_per_step --> 4.905 | [2024-08-09 13:05:16,575][Main][INFO] - [train] Step 3450 out of 80000 | Loss --> 5.349 | Grad_l2 --> 1.951 | Weights_l2 --> 8603.727 | Lr --> 0.005 | Seconds_per_step --> 4.655 | [2024-08-09 13:08:27,542][Main][INFO] - [train] Step 3500 out of 80000 | Loss --> 5.356 | Grad_l2 --> 1.986 | Weights_l2 --> 8603.662 | Lr --> 0.005 | Seconds_per_step --> 3.819 | [2024-08-09 13:12:30,541][Main][INFO] - [train] Step 3550 out of 80000 | Loss --> 5.312 | Grad_l2 --> 2.396 | Weights_l2 --> 8603.545 | Lr --> 0.005 | Seconds_per_step --> 4.860 | [2024-08-09 13:16:49,213][Main][INFO] - [train] Step 3600 out of 80000 | Loss --> 5.299 | Grad_l2 --> 2.230 | Weights_l2 --> 8603.411 | Lr --> 0.005 | Seconds_per_step --> 5.173 | [2024-08-09 13:20:53,058][Main][INFO] - [train] Step 3650 out of 80000 | Loss --> 5.307 | Grad_l2 --> 2.386 | Weights_l2 --> 8603.284 | Lr --> 0.005 | Seconds_per_step --> 4.877 | [2024-08-09 13:24:44,487][Main][INFO] - [train] Step 3700 out of 80000 | Loss --> 5.293 | Grad_l2 --> 2.071 | Weights_l2 --> 8603.169 | Lr --> 0.005 | Seconds_per_step --> 4.629 | [2024-08-09 13:28:47,607][Main][INFO] - [train] Step 3750 out of 80000 | Loss --> 5.298 | Grad_l2 --> 2.199 | Weights_l2 --> 8603.065 | Lr --> 0.005 | Seconds_per_step --> 4.862 | [2024-08-09 13:32:52,512][Main][INFO] - [train] Step 3800 out of 80000 | Loss --> 5.277 | Grad_l2 --> 2.091 | Weights_l2 --> 8602.962 | Lr --> 0.006 | Seconds_per_step --> 4.898 | [2024-08-09 13:36:42,719][Main][INFO] - [train] Step 3850 out of 80000 | Loss --> 5.284 | Grad_l2 --> 2.042 | Weights_l2 --> 8602.881 | Lr --> 0.006 | Seconds_per_step --> 4.604 | [2024-08-09 13:40:34,318][Main][INFO] - [train] Step 3900 out of 80000 | Loss --> 5.245 | Grad_l2 --> 2.240 | Weights_l2 --> 8602.781 | Lr --> 0.006 | Seconds_per_step --> 4.632 | [2024-08-09 13:44:45,754][Main][INFO] - [train] Step 3950 out of 80000 | Loss --> 5.245 | Grad_l2 --> 1.955 | Weights_l2 --> 8602.686 | Lr --> 0.006 | Seconds_per_step --> 5.029 | [2024-08-09 13:48:39,099][Main][INFO] - [train] Step 4000 out of 80000 | Loss --> 5.257 | Grad_l2 --> 2.011 | Weights_l2 --> 8602.644 | Lr --> 0.006 | Seconds_per_step --> 4.667 | [2024-08-09 13:52:31,353][Main][INFO] - [train] Step 4050 out of 80000 | Loss --> 5.239 | Grad_l2 --> 1.838 | Weights_l2 --> 8602.573 | Lr --> 0.006 | Seconds_per_step --> 4.645 | [2024-08-09 13:56:29,186][Main][INFO] - [train] Step 4100 out of 80000 | Loss --> 5.238 | Grad_l2 --> 1.935 | Weights_l2 --> 8602.540 | Lr --> 0.006 | Seconds_per_step --> 4.757 | [2024-08-09 14:00:27,682][Main][INFO] - [train] Step 4150 out of 80000 | Loss --> 5.211 | Grad_l2 --> 2.014 | Weights_l2 --> 8602.468 | Lr --> 0.006 | Seconds_per_step --> 4.770 | [2024-08-09 14:04:26,879][Main][INFO] - [train] Step 4200 out of 80000 | Loss --> 5.202 | Grad_l2 --> 2.106 | Weights_l2 --> 8602.418 | Lr --> 0.006 | Seconds_per_step --> 4.784 | [2024-08-09 14:08:26,097][Main][INFO] - [train] Step 4250 out of 80000 | Loss --> 5.194 | Grad_l2 --> 1.876 | Weights_l2 --> 8602.330 | Lr --> 0.006 | Seconds_per_step --> 4.784 | [2024-08-09 14:12:43,883][Main][INFO] - [train] Step 4300 out of 80000 | Loss --> 5.216 | Grad_l2 --> 1.692 | Weights_l2 --> 8602.339 | Lr --> 0.006 | Seconds_per_step --> 5.156 | [2024-08-09 14:16:59,892][Main][INFO] - [train] Step 4350 out of 80000 | Loss --> 5.195 | Grad_l2 --> 1.824 | Weights_l2 --> 8602.342 | Lr --> 0.006 | Seconds_per_step --> 5.120 | [2024-08-09 14:20:57,072][Main][INFO] - [train] Step 4400 out of 80000 | Loss --> 5.193 | Grad_l2 --> 1.640 | Weights_l2 --> 8602.351 | Lr --> 0.006 | Seconds_per_step --> 4.744 | [2024-08-09 14:25:01,683][Main][INFO] - [train] Step 4450 out of 80000 | Loss --> 5.186 | Grad_l2 --> 1.790 | Weights_l2 --> 8602.369 | Lr --> 0.006 | Seconds_per_step --> 4.892 | [2024-08-09 14:29:08,638][Main][INFO] - [train] Step 4500 out of 80000 | Loss --> 5.162 | Grad_l2 --> 1.890 | Weights_l2 --> 8602.364 | Lr --> 0.006 | Seconds_per_step --> 4.939 | [2024-08-09 14:32:58,390][Main][INFO] - [train] Step 4550 out of 80000 | Loss --> 5.136 | Grad_l2 --> 1.776 | Weights_l2 --> 8602.345 | Lr --> 0.006 | Seconds_per_step --> 4.595 | [2024-08-09 14:37:00,248][Main][INFO] - [train] Step 4600 out of 80000 | Loss --> 5.135 | Grad_l2 --> 1.661 | Weights_l2 --> 8602.366 | Lr --> 0.006 | Seconds_per_step --> 4.837 | [2024-08-09 14:41:11,560][Main][INFO] - [train] Step 4650 out of 80000 | Loss --> 5.139 | Grad_l2 --> 1.623 | Weights_l2 --> 8602.434 | Lr --> 0.006 | Seconds_per_step --> 5.026 | [2024-08-09 14:45:14,951][Main][INFO] - [train] Step 4700 out of 80000 | Loss --> 5.090 | Grad_l2 --> 1.703 | Weights_l2 --> 8602.491 | Lr --> 0.006 | Seconds_per_step --> 4.868 | [2024-08-09 14:49:09,655][Main][INFO] - [train] Step 4750 out of 80000 | Loss --> 5.056 | Grad_l2 --> 1.918 | Weights_l2 --> 8602.542 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 14:53:11,228][Main][INFO] - [train] Step 4800 out of 80000 | Loss --> 5.018 | Grad_l2 --> 1.805 | Weights_l2 --> 8602.552 | Lr --> 0.006 | Seconds_per_step --> 4.831 | [2024-08-09 14:57:15,004][Main][INFO] - [train] Step 4850 out of 80000 | Loss --> 5.016 | Grad_l2 --> 1.660 | Weights_l2 --> 8602.639 | Lr --> 0.006 | Seconds_per_step --> 4.876 | [2024-08-09 15:01:09,698][Main][INFO] - [train] Step 4900 out of 80000 | Loss --> 4.994 | Grad_l2 --> 1.595 | Weights_l2 --> 8602.806 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 15:04:01,695][Main][INFO] - [train] Step 4950 out of 80000 | Loss --> 4.946 | Grad_l2 --> 1.783 | Weights_l2 --> 8602.949 | Lr --> 0.006 | Seconds_per_step --> 3.440 | [2024-08-09 15:07:39,946][Main][INFO] - [train] Step 5000 out of 80000 | Loss --> 4.722 | Grad_l2 --> 1.590 | Weights_l2 --> 8603.165 | Lr --> 0.006 | Seconds_per_step --> 4.365 | [2024-08-09 15:07:39,947][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-5000 [2024-08-09 15:07:39,951][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-09 15:07:46,022][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-5000/model.safetensors [2024-08-09 15:07:49,438][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-5000/optimizer.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-5000/scheduler.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-5000/sampler.bin [2024-08-09 15:07:49,439][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-5000/sampler_1.bin [2024-08-09 15:07:49,440][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-5000/random_states_0.pkl [2024-08-09 15:11:55,741][Main][INFO] - [train] Step 5050 out of 80000 | Loss --> 4.582 | Grad_l2 --> 1.679 | Weights_l2 --> 8603.473 | Lr --> 0.006 | Seconds_per_step --> 5.116 | [2024-08-09 15:15:46,314][Main][INFO] - [train] Step 5100 out of 80000 | Loss --> 4.472 | Grad_l2 --> 1.636 | Weights_l2 --> 8603.746 | Lr --> 0.006 | Seconds_per_step --> 4.611 | [2024-08-09 15:19:45,374][Main][INFO] - [train] Step 5150 out of 80000 | Loss --> 4.370 | Grad_l2 --> 1.523 | Weights_l2 --> 8604.092 | Lr --> 0.006 | Seconds_per_step --> 4.781 | [2024-08-09 15:23:51,223][Main][INFO] - [train] Step 5200 out of 80000 | Loss --> 4.267 | Grad_l2 --> 1.542 | Weights_l2 --> 8604.440 | Lr --> 0.006 | Seconds_per_step --> 4.917 | [2024-08-09 15:27:51,655][Main][INFO] - [train] Step 5250 out of 80000 | Loss --> 4.191 | Grad_l2 --> 1.477 | Weights_l2 --> 8604.872 | Lr --> 0.006 | Seconds_per_step --> 4.809 | [2024-08-09 15:31:44,251][Main][INFO] - [train] Step 5300 out of 80000 | Loss --> 4.128 | Grad_l2 --> 1.490 | Weights_l2 --> 8605.306 | Lr --> 0.006 | Seconds_per_step --> 4.652 | [2024-08-09 15:35:40,470][Main][INFO] - [train] Step 5350 out of 80000 | Loss --> 4.067 | Grad_l2 --> 1.397 | Weights_l2 --> 8605.776 | Lr --> 0.006 | Seconds_per_step --> 4.724 | [2024-08-09 15:39:48,973][Main][INFO] - [train] Step 5400 out of 80000 | Loss --> 4.015 | Grad_l2 --> 1.239 | Weights_l2 --> 8606.428 | Lr --> 0.006 | Seconds_per_step --> 4.970 | [2024-08-09 15:43:39,070][Main][INFO] - [train] Step 5450 out of 80000 | Loss --> 3.968 | Grad_l2 --> 1.219 | Weights_l2 --> 8607.147 | Lr --> 0.006 | Seconds_per_step --> 4.602 | [2024-08-09 15:47:34,049][Main][INFO] - [train] Step 5500 out of 80000 | Loss --> 3.903 | Grad_l2 --> 1.203 | Weights_l2 --> 8607.924 | Lr --> 0.006 | Seconds_per_step --> 4.700 | [2024-08-09 15:51:38,499][Main][INFO] - [train] Step 5550 out of 80000 | Loss --> 3.855 | Grad_l2 --> 1.167 | Weights_l2 --> 8608.720 | Lr --> 0.006 | Seconds_per_step --> 4.889 | [2024-08-09 15:55:46,120][Main][INFO] - [train] Step 5600 out of 80000 | Loss --> 3.815 | Grad_l2 --> 1.111 | Weights_l2 --> 8609.615 | Lr --> 0.006 | Seconds_per_step --> 4.952 | [2024-08-09 15:59:40,828][Main][INFO] - [train] Step 5650 out of 80000 | Loss --> 3.768 | Grad_l2 --> 1.066 | Weights_l2 --> 8610.530 | Lr --> 0.006 | Seconds_per_step --> 4.694 | [2024-08-09 16:03:38,938][Main][INFO] - [train] Step 5700 out of 80000 | Loss --> 3.711 | Grad_l2 --> 1.048 | Weights_l2 --> 8611.436 | Lr --> 0.006 | Seconds_per_step --> 4.762 | [2024-08-09 16:07:49,871][Main][INFO] - [train] Step 5750 out of 80000 | Loss --> 3.675 | Grad_l2 --> 0.998 | Weights_l2 --> 8612.404 | Lr --> 0.006 | Seconds_per_step --> 5.019 | [2024-08-09 16:11:53,420][Main][INFO] - [train] Step 5800 out of 80000 | Loss --> 3.625 | Grad_l2 --> 0.993 | Weights_l2 --> 8613.329 | Lr --> 0.006 | Seconds_per_step --> 4.871 | [2024-08-09 16:15:50,534][Main][INFO] - [train] Step 5850 out of 80000 | Loss --> 3.580 | Grad_l2 --> 0.952 | Weights_l2 --> 8614.289 | Lr --> 0.006 | Seconds_per_step --> 4.742 | [2024-08-09 16:19:45,983][Main][INFO] - [train] Step 5900 out of 80000 | Loss --> 3.545 | Grad_l2 --> 1.014 | Weights_l2 --> 8615.197 | Lr --> 0.006 | Seconds_per_step --> 4.709 | [2024-08-09 16:23:51,342][Main][INFO] - [train] Step 5950 out of 80000 | Loss --> 3.522 | Grad_l2 --> 0.927 | Weights_l2 --> 8616.137 | Lr --> 0.006 | Seconds_per_step --> 4.907 | [2024-08-09 16:27:42,121][Main][INFO] - [train] Step 6000 out of 80000 | Loss --> 3.483 | Grad_l2 --> 0.926 | Weights_l2 --> 8617.066 | Lr --> 0.006 | Seconds_per_step --> 4.616 | [2024-08-09 16:31:41,278][Main][INFO] - [train] Step 6050 out of 80000 | Loss --> 3.455 | Grad_l2 --> 0.886 | Weights_l2 --> 8617.977 | Lr --> 0.006 | Seconds_per_step --> 4.783 | [2024-08-09 16:35:47,786][Main][INFO] - [train] Step 6100 out of 80000 | Loss --> 3.428 | Grad_l2 --> 0.956 | Weights_l2 --> 8618.840 | Lr --> 0.006 | Seconds_per_step --> 4.930 | [2024-08-09 16:39:45,096][Main][INFO] - [train] Step 6150 out of 80000 | Loss --> 3.399 | Grad_l2 --> 0.832 | Weights_l2 --> 8619.684 | Lr --> 0.006 | Seconds_per_step --> 4.746 | [2024-08-09 16:43:41,554][Main][INFO] - [train] Step 6200 out of 80000 | Loss --> 3.377 | Grad_l2 --> 0.868 | Weights_l2 --> 8620.530 | Lr --> 0.006 | Seconds_per_step --> 4.729 | [2024-08-09 16:47:45,442][Main][INFO] - [train] Step 6250 out of 80000 | Loss --> 3.363 | Grad_l2 --> 0.850 | Weights_l2 --> 8621.325 | Lr --> 0.006 | Seconds_per_step --> 4.878 | [2024-08-09 16:51:50,312][Main][INFO] - [train] Step 6300 out of 80000 | Loss --> 3.332 | Grad_l2 --> 0.840 | Weights_l2 --> 8622.117 | Lr --> 0.007 | Seconds_per_step --> 4.897 | [2024-08-09 16:55:47,619][Main][INFO] - [train] Step 6350 out of 80000 | Loss --> 3.311 | Grad_l2 --> 0.875 | Weights_l2 --> 8622.932 | Lr --> 0.007 | Seconds_per_step --> 4.746 | [2024-08-09 16:59:44,744][Main][INFO] - [train] Step 6400 out of 80000 | Loss --> 3.289 | Grad_l2 --> 0.808 | Weights_l2 --> 8623.729 | Lr --> 0.007 | Seconds_per_step --> 4.742 | [2024-08-09 17:03:47,092][Main][INFO] - [train] Step 6450 out of 80000 | Loss --> 3.279 | Grad_l2 --> 0.782 | Weights_l2 --> 8624.498 | Lr --> 0.007 | Seconds_per_step --> 4.847 | [2024-08-09 17:07:51,580][Main][INFO] - [train] Step 6500 out of 80000 | Loss --> 3.250 | Grad_l2 --> 0.812 | Weights_l2 --> 8625.266 | Lr --> 0.007 | Seconds_per_step --> 4.890 | [2024-08-09 17:11:44,444][Main][INFO] - [train] Step 6550 out of 80000 | Loss --> 3.248 | Grad_l2 --> 0.806 | Weights_l2 --> 8626.024 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 17:15:43,498][Main][INFO] - [train] Step 6600 out of 80000 | Loss --> 3.216 | Grad_l2 --> 0.765 | Weights_l2 --> 8626.794 | Lr --> 0.007 | Seconds_per_step --> 4.781 | [2024-08-09 17:19:50,311][Main][INFO] - [train] Step 6650 out of 80000 | Loss --> 3.209 | Grad_l2 --> 0.793 | Weights_l2 --> 8627.521 | Lr --> 0.007 | Seconds_per_step --> 4.936 | [2024-08-09 17:23:54,093][Main][INFO] - [train] Step 6700 out of 80000 | Loss --> 3.200 | Grad_l2 --> 0.788 | Weights_l2 --> 8628.294 | Lr --> 0.007 | Seconds_per_step --> 4.876 | [2024-08-09 17:27:47,402][Main][INFO] - [train] Step 6750 out of 80000 | Loss --> 3.176 | Grad_l2 --> 0.762 | Weights_l2 --> 8629.053 | Lr --> 0.007 | Seconds_per_step --> 4.666 | [2024-08-09 17:31:49,523][Main][INFO] - [train] Step 6800 out of 80000 | Loss --> 3.170 | Grad_l2 --> 0.778 | Weights_l2 --> 8629.825 | Lr --> 0.007 | Seconds_per_step --> 4.842 | [2024-08-09 17:35:52,826][Main][INFO] - [train] Step 6850 out of 80000 | Loss --> 3.159 | Grad_l2 --> 0.775 | Weights_l2 --> 8630.568 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-09 17:39:46,125][Main][INFO] - [train] Step 6900 out of 80000 | Loss --> 3.158 | Grad_l2 --> 0.757 | Weights_l2 --> 8631.325 | Lr --> 0.007 | Seconds_per_step --> 4.666 | [2024-08-09 17:43:39,817][Main][INFO] - [train] Step 6950 out of 80000 | Loss --> 3.138 | Grad_l2 --> 0.766 | Weights_l2 --> 8632.055 | Lr --> 0.007 | Seconds_per_step --> 4.674 | [2024-08-09 17:47:44,929][Main][INFO] - [train] Step 7000 out of 80000 | Loss --> 3.123 | Grad_l2 --> 0.759 | Weights_l2 --> 8632.805 | Lr --> 0.007 | Seconds_per_step --> 4.902 | [2024-08-09 17:51:43,866][Main][INFO] - [train] Step 7050 out of 80000 | Loss --> 3.118 | Grad_l2 --> 0.752 | Weights_l2 --> 8633.540 | Lr --> 0.007 | Seconds_per_step --> 4.779 | [2024-08-09 17:55:42,820][Main][INFO] - [train] Step 7100 out of 80000 | Loss --> 3.103 | Grad_l2 --> 0.757 | Weights_l2 --> 8634.285 | Lr --> 0.007 | Seconds_per_step --> 4.779 | [2024-08-09 17:59:44,322][Main][INFO] - [train] Step 7150 out of 80000 | Loss --> 3.083 | Grad_l2 --> 0.755 | Weights_l2 --> 8635.030 | Lr --> 0.007 | Seconds_per_step --> 4.830 | [2024-08-09 18:03:44,919][Main][INFO] - [train] Step 7200 out of 80000 | Loss --> 3.073 | Grad_l2 --> 0.735 | Weights_l2 --> 8635.760 | Lr --> 0.007 | Seconds_per_step --> 4.812 | [2024-08-09 18:07:37,774][Main][INFO] - [train] Step 7250 out of 80000 | Loss --> 3.055 | Grad_l2 --> 0.718 | Weights_l2 --> 8636.493 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 18:11:34,198][Main][INFO] - [train] Step 7300 out of 80000 | Loss --> 3.051 | Grad_l2 --> 0.721 | Weights_l2 --> 8637.245 | Lr --> 0.007 | Seconds_per_step --> 4.728 | [2024-08-09 18:15:38,927][Main][INFO] - [train] Step 7350 out of 80000 | Loss --> 3.041 | Grad_l2 --> 0.762 | Weights_l2 --> 8637.991 | Lr --> 0.007 | Seconds_per_step --> 4.895 | [2024-08-09 18:19:42,181][Main][INFO] - [train] Step 7400 out of 80000 | Loss --> 3.031 | Grad_l2 --> 0.720 | Weights_l2 --> 8638.728 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-09 18:23:37,911][Main][INFO] - [train] Step 7450 out of 80000 | Loss --> 3.033 | Grad_l2 --> 0.718 | Weights_l2 --> 8639.471 | Lr --> 0.007 | Seconds_per_step --> 4.715 | [2024-08-09 18:27:38,146][Main][INFO] - [train] Step 7500 out of 80000 | Loss --> 3.020 | Grad_l2 --> 0.729 | Weights_l2 --> 8640.206 | Lr --> 0.007 | Seconds_per_step --> 4.805 | [2024-08-09 18:31:39,590][Main][INFO] - [train] Step 7550 out of 80000 | Loss --> 3.004 | Grad_l2 --> 0.734 | Weights_l2 --> 8640.967 | Lr --> 0.007 | Seconds_per_step --> 4.829 | [2024-08-09 18:35:32,805][Main][INFO] - [train] Step 7600 out of 80000 | Loss --> 2.986 | Grad_l2 --> 0.714 | Weights_l2 --> 8641.711 | Lr --> 0.007 | Seconds_per_step --> 4.664 | [2024-08-09 18:39:28,080][Main][INFO] - [train] Step 7650 out of 80000 | Loss --> 2.994 | Grad_l2 --> 0.743 | Weights_l2 --> 8642.483 | Lr --> 0.007 | Seconds_per_step --> 4.705 | [2024-08-09 18:43:37,815][Main][INFO] - [train] Step 7700 out of 80000 | Loss --> 2.980 | Grad_l2 --> 0.699 | Weights_l2 --> 8643.242 | Lr --> 0.007 | Seconds_per_step --> 4.995 | [2024-08-09 18:47:42,799][Main][INFO] - [train] Step 7750 out of 80000 | Loss --> 2.976 | Grad_l2 --> 0.725 | Weights_l2 --> 8643.993 | Lr --> 0.007 | Seconds_per_step --> 4.900 | [2024-08-09 18:51:34,464][Main][INFO] - [train] Step 7800 out of 80000 | Loss --> 2.963 | Grad_l2 --> 0.699 | Weights_l2 --> 8644.781 | Lr --> 0.007 | Seconds_per_step --> 4.633 | [2024-08-09 18:55:32,534][Main][INFO] - [train] Step 7850 out of 80000 | Loss --> 2.954 | Grad_l2 --> 0.706 | Weights_l2 --> 8645.547 | Lr --> 0.007 | Seconds_per_step --> 4.761 | [2024-08-09 18:59:39,507][Main][INFO] - [train] Step 7900 out of 80000 | Loss --> 2.947 | Grad_l2 --> 0.689 | Weights_l2 --> 8646.333 | Lr --> 0.007 | Seconds_per_step --> 4.939 | [2024-08-09 19:03:32,747][Main][INFO] - [train] Step 7950 out of 80000 | Loss --> 2.935 | Grad_l2 --> 0.701 | Weights_l2 --> 8647.099 | Lr --> 0.007 | Seconds_per_step --> 4.665 | [2024-08-09 19:07:42,994][Main][INFO] - [train] Step 8000 out of 80000 | Loss --> 2.940 | Grad_l2 --> 0.709 | Weights_l2 --> 8647.889 | Lr --> 0.007 | Seconds_per_step --> 5.005 | [2024-08-09 19:11:49,930][Main][INFO] - [train] Step 8050 out of 80000 | Loss --> 2.919 | Grad_l2 --> 0.699 | Weights_l2 --> 8648.663 | Lr --> 0.007 | Seconds_per_step --> 4.939 | [2024-08-09 19:16:03,022][Main][INFO] - [train] Step 8100 out of 80000 | Loss --> 2.916 | Grad_l2 --> 0.690 | Weights_l2 --> 8649.453 | Lr --> 0.007 | Seconds_per_step --> 5.062 | [2024-08-09 19:20:05,203][Main][INFO] - [train] Step 8150 out of 80000 | Loss --> 2.914 | Grad_l2 --> 0.712 | Weights_l2 --> 8650.238 | Lr --> 0.007 | Seconds_per_step --> 4.844 | [2024-08-09 19:23:57,007][Main][INFO] - [train] Step 8200 out of 80000 | Loss --> 2.903 | Grad_l2 --> 0.727 | Weights_l2 --> 8651.038 | Lr --> 0.007 | Seconds_per_step --> 4.636 | [2024-08-09 19:28:02,052][Main][INFO] - [train] Step 8250 out of 80000 | Loss --> 2.896 | Grad_l2 --> 0.691 | Weights_l2 --> 8651.842 | Lr --> 0.007 | Seconds_per_step --> 4.901 | [2024-08-09 19:32:01,708][Main][INFO] - [train] Step 8300 out of 80000 | Loss --> 2.889 | Grad_l2 --> 0.703 | Weights_l2 --> 8652.661 | Lr --> 0.007 | Seconds_per_step --> 4.793 | [2024-08-09 19:35:54,542][Main][INFO] - [train] Step 8350 out of 80000 | Loss --> 2.882 | Grad_l2 --> 0.672 | Weights_l2 --> 8653.459 | Lr --> 0.007 | Seconds_per_step --> 4.657 | [2024-08-09 19:39:53,565][Main][INFO] - [train] Step 8400 out of 80000 | Loss --> 2.861 | Grad_l2 --> 0.676 | Weights_l2 --> 8654.299 | Lr --> 0.007 | Seconds_per_step --> 4.780 | [2024-08-09 19:43:54,929][Main][INFO] - [train] Step 8450 out of 80000 | Loss --> 2.870 | Grad_l2 --> 0.680 | Weights_l2 --> 8655.106 | Lr --> 0.007 | Seconds_per_step --> 4.827 | [2024-08-09 19:47:46,390][Main][INFO] - [train] Step 8500 out of 80000 | Loss --> 2.857 | Grad_l2 --> 0.673 | Weights_l2 --> 8655.929 | Lr --> 0.007 | Seconds_per_step --> 4.629 | [2024-08-09 19:51:41,774][Main][INFO] - [train] Step 8550 out of 80000 | Loss --> 2.847 | Grad_l2 --> 0.674 | Weights_l2 --> 8656.760 | Lr --> 0.007 | Seconds_per_step --> 4.708 | [2024-08-09 19:55:50,508][Main][INFO] - [train] Step 8600 out of 80000 | Loss --> 2.838 | Grad_l2 --> 0.679 | Weights_l2 --> 8657.613 | Lr --> 0.007 | Seconds_per_step --> 4.975 | [2024-08-09 19:59:55,899][Main][INFO] - [train] Step 8650 out of 80000 | Loss --> 2.847 | Grad_l2 --> 0.668 | Weights_l2 --> 8658.480 | Lr --> 0.007 | Seconds_per_step --> 4.908 | [2024-08-09 20:03:46,940][Main][INFO] - [train] Step 8700 out of 80000 | Loss --> 2.834 | Grad_l2 --> 0.689 | Weights_l2 --> 8659.322 | Lr --> 0.007 | Seconds_per_step --> 4.621 | [2024-08-09 20:07:40,599][Main][INFO] - [train] Step 8750 out of 80000 | Loss --> 2.814 | Grad_l2 --> 0.665 | Weights_l2 --> 8660.208 | Lr --> 0.007 | Seconds_per_step --> 4.673 | [2024-08-09 20:11:45,521][Main][INFO] - [train] Step 8800 out of 80000 | Loss --> 2.817 | Grad_l2 --> 0.645 | Weights_l2 --> 8661.057 | Lr --> 0.008 | Seconds_per_step --> 4.898 | [2024-08-09 20:15:34,178][Main][INFO] - [train] Step 8850 out of 80000 | Loss --> 2.807 | Grad_l2 --> 0.662 | Weights_l2 --> 8661.931 | Lr --> 0.008 | Seconds_per_step --> 4.573 | [2024-08-09 20:19:09,957][Main][INFO] - [train] Step 8900 out of 80000 | Loss --> 2.806 | Grad_l2 --> 0.671 | Weights_l2 --> 8662.810 | Lr --> 0.008 | Seconds_per_step --> 4.316 | [2024-08-09 20:22:37,497][Main][INFO] - [train] Step 8950 out of 80000 | Loss --> 2.799 | Grad_l2 --> 0.656 | Weights_l2 --> 8663.699 | Lr --> 0.008 | Seconds_per_step --> 4.151 | [2024-08-09 20:26:01,302][Main][INFO] - [train] Step 9000 out of 80000 | Loss --> 2.796 | Grad_l2 --> 0.657 | Weights_l2 --> 8664.591 | Lr --> 0.008 | Seconds_per_step --> 4.076 | [2024-08-09 20:29:28,057][Main][INFO] - [train] Step 9050 out of 80000 | Loss --> 2.787 | Grad_l2 --> 0.650 | Weights_l2 --> 8665.480 | Lr --> 0.008 | Seconds_per_step --> 4.135 | [2024-08-09 20:32:55,736][Main][INFO] - [train] Step 9100 out of 80000 | Loss --> 2.771 | Grad_l2 --> 0.668 | Weights_l2 --> 8666.372 | Lr --> 0.008 | Seconds_per_step --> 4.154 | [2024-08-09 20:36:26,470][Main][INFO] - [train] Step 9150 out of 80000 | Loss --> 2.762 | Grad_l2 --> 0.630 | Weights_l2 --> 8667.256 | Lr --> 0.008 | Seconds_per_step --> 4.215 | [2024-08-09 20:40:02,302][Main][INFO] - [train] Step 9200 out of 80000 | Loss --> 2.764 | Grad_l2 --> 0.668 | Weights_l2 --> 8668.181 | Lr --> 0.008 | Seconds_per_step --> 4.317 | [2024-08-09 20:43:38,319][Main][INFO] - [train] Step 9250 out of 80000 | Loss --> 2.760 | Grad_l2 --> 0.658 | Weights_l2 --> 8669.118 | Lr --> 0.008 | Seconds_per_step --> 4.320 | [2024-08-09 20:47:12,593][Main][INFO] - [train] Step 9300 out of 80000 | Loss --> 2.754 | Grad_l2 --> 0.631 | Weights_l2 --> 8670.046 | Lr --> 0.008 | Seconds_per_step --> 4.285 | [2024-08-09 20:50:50,547][Main][INFO] - [train] Step 9350 out of 80000 | Loss --> 2.748 | Grad_l2 --> 0.659 | Weights_l2 --> 8670.961 | Lr --> 0.008 | Seconds_per_step --> 4.359 | [2024-08-09 20:54:27,164][Main][INFO] - [train] Step 9400 out of 80000 | Loss --> 2.745 | Grad_l2 --> 0.645 | Weights_l2 --> 8671.908 | Lr --> 0.008 | Seconds_per_step --> 4.332 | [2024-08-09 20:57:57,318][Main][INFO] - [train] Step 9450 out of 80000 | Loss --> 2.734 | Grad_l2 --> 0.651 | Weights_l2 --> 8672.837 | Lr --> 0.008 | Seconds_per_step --> 4.203 | [2024-08-09 21:01:27,114][Main][INFO] - [train] Step 9500 out of 80000 | Loss --> 2.724 | Grad_l2 --> 0.651 | Weights_l2 --> 8673.783 | Lr --> 0.008 | Seconds_per_step --> 4.196 | [2024-08-09 21:05:01,540][Main][INFO] - [train] Step 9550 out of 80000 | Loss --> 2.723 | Grad_l2 --> 0.635 | Weights_l2 --> 8674.757 | Lr --> 0.008 | Seconds_per_step --> 4.289 | [2024-08-09 21:08:31,178][Main][INFO] - [train] Step 9600 out of 80000 | Loss --> 2.707 | Grad_l2 --> 0.633 | Weights_l2 --> 8675.741 | Lr --> 0.008 | Seconds_per_step --> 4.193 | [2024-08-09 21:12:04,549][Main][INFO] - [train] Step 9650 out of 80000 | Loss --> 2.705 | Grad_l2 --> 0.662 | Weights_l2 --> 8676.698 | Lr --> 0.008 | Seconds_per_step --> 4.267 | [2024-08-09 21:15:31,359][Main][INFO] - [train] Step 9700 out of 80000 | Loss --> 2.701 | Grad_l2 --> 0.620 | Weights_l2 --> 8677.665 | Lr --> 0.008 | Seconds_per_step --> 4.136 | [2024-08-09 21:19:05,681][Main][INFO] - [train] Step 9750 out of 80000 | Loss --> 2.696 | Grad_l2 --> 0.635 | Weights_l2 --> 8678.669 | Lr --> 0.008 | Seconds_per_step --> 4.286 | [2024-08-09 21:22:39,126][Main][INFO] - [train] Step 9800 out of 80000 | Loss --> 2.698 | Grad_l2 --> 0.652 | Weights_l2 --> 8679.660 | Lr --> 0.008 | Seconds_per_step --> 4.269 | [2024-08-09 21:26:12,926][Main][INFO] - [train] Step 9850 out of 80000 | Loss --> 2.691 | Grad_l2 --> 0.629 | Weights_l2 --> 8680.657 | Lr --> 0.008 | Seconds_per_step --> 4.276 | [2024-08-09 21:29:43,650][Main][INFO] - [train] Step 9900 out of 80000 | Loss --> 2.683 | Grad_l2 --> 0.639 | Weights_l2 --> 8681.671 | Lr --> 0.008 | Seconds_per_step --> 4.214 | [2024-08-09 21:33:15,612][Main][INFO] - [train] Step 9950 out of 80000 | Loss --> 2.678 | Grad_l2 --> 0.624 | Weights_l2 --> 8682.710 | Lr --> 0.008 | Seconds_per_step --> 4.239 | [2024-08-09 21:36:48,784][Main][INFO] - [train] Step 10000 out of 80000 | Loss --> 2.683 | Grad_l2 --> 0.631 | Weights_l2 --> 8683.746 | Lr --> 0.008 | Seconds_per_step --> 4.263 | [2024-08-09 21:36:48,785][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 [2024-08-09 21:36:48,789][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-09 21:36:50,921][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin [2024-08-09 21:36:54,146][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin [2024-08-09 21:36:54,147][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin [2024-08-09 21:36:54,147][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl [2024-08-09 21:40:24,314][Main][INFO] - [train] Step 10050 out of 80000 | Loss --> 2.672 | Grad_l2 --> 0.620 | Weights_l2 --> 8684.763 | Lr --> 0.008 | Seconds_per_step --> 4.311 | [2024-08-09 21:43:54,934][Main][INFO] - [train] Step 10100 out of 80000 | Loss --> 2.668 | Grad_l2 --> 0.630 | Weights_l2 --> 8685.788 | Lr --> 0.008 | Seconds_per_step --> 4.212 | [2024-08-09 21:47:26,893][Main][INFO] - [train] Step 10150 out of 80000 | Loss --> 2.664 | Grad_l2 --> 0.622 | Weights_l2 --> 8686.819 | Lr --> 0.008 | Seconds_per_step --> 4.239 | [2024-08-09 21:50:55,047][Main][INFO] - [train] Step 10200 out of 80000 | Loss --> 2.647 | Grad_l2 --> 0.609 | Weights_l2 --> 8687.859 | Lr --> 0.008 | Seconds_per_step --> 4.163 | [2024-08-09 21:54:22,462][Main][INFO] - [train] Step 10250 out of 80000 | Loss --> 2.655 | Grad_l2 --> 0.613 | Weights_l2 --> 8688.883 | Lr --> 0.008 | Seconds_per_step --> 4.148 | [2024-08-09 21:57:52,835][Main][INFO] - [train] Step 10300 out of 80000 | Loss --> 2.637 | Grad_l2 --> 0.623 | Weights_l2 --> 8689.917 | Lr --> 0.008 | Seconds_per_step --> 4.207 | [2024-08-09 22:01:30,833][Main][INFO] - [train] Step 10350 out of 80000 | Loss --> 2.650 | Grad_l2 --> 0.636 | Weights_l2 --> 8690.965 | Lr --> 0.008 | Seconds_per_step --> 4.360 | [2024-08-09 22:04:59,449][Main][INFO] - [train] Step 10400 out of 80000 | Loss --> 2.630 | Grad_l2 --> 0.619 | Weights_l2 --> 8691.976 | Lr --> 0.008 | Seconds_per_step --> 4.172 | [2024-08-09 22:08:29,303][Main][INFO] - [train] Step 10450 out of 80000 | Loss --> 2.617 | Grad_l2 --> 0.615 | Weights_l2 --> 8693.000 | Lr --> 0.008 | Seconds_per_step --> 4.197 | [2024-08-09 22:12:03,306][Main][INFO] - [train] Step 10500 out of 80000 | Loss --> 2.627 | Grad_l2 --> 0.615 | Weights_l2 --> 8694.037 | Lr --> 0.008 | Seconds_per_step --> 4.280 | [2024-08-09 22:15:37,789][Main][INFO] - [train] Step 10550 out of 80000 | Loss --> 2.612 | Grad_l2 --> 0.594 | Weights_l2 --> 8695.071 | Lr --> 0.008 | Seconds_per_step --> 4.290 | [2024-08-09 22:19:13,830][Main][INFO] - [train] Step 10600 out of 80000 | Loss --> 2.599 | Grad_l2 --> 0.608 | Weights_l2 --> 8696.095 | Lr --> 0.008 | Seconds_per_step --> 4.321 | [2024-08-09 22:22:47,537][Main][INFO] - [train] Step 10650 out of 80000 | Loss --> 2.598 | Grad_l2 --> 0.619 | Weights_l2 --> 8697.144 | Lr --> 0.008 | Seconds_per_step --> 4.274 | [2024-08-09 22:26:27,089][Main][INFO] - [train] Step 10700 out of 80000 | Loss --> 2.602 | Grad_l2 --> 0.627 | Weights_l2 --> 8698.176 | Lr --> 0.008 | Seconds_per_step --> 4.391 | [2024-08-09 22:30:08,291][Main][INFO] - [train] Step 10750 out of 80000 | Loss --> 2.598 | Grad_l2 --> 0.603 | Weights_l2 --> 8699.195 | Lr --> 0.008 | Seconds_per_step --> 4.424 | [2024-08-09 22:33:50,515][Main][INFO] - [train] Step 10800 out of 80000 | Loss --> 2.600 | Grad_l2 --> 0.615 | Weights_l2 --> 8700.255 | Lr --> 0.008 | Seconds_per_step --> 4.444 | [2024-08-09 22:37:23,733][Main][INFO] - [train] Step 10850 out of 80000 | Loss --> 2.588 | Grad_l2 --> 0.604 | Weights_l2 --> 8701.311 | Lr --> 0.008 | Seconds_per_step --> 4.264 | [2024-08-09 22:40:59,607][Main][INFO] - [train] Step 10900 out of 80000 | Loss --> 2.585 | Grad_l2 --> 0.605 | Weights_l2 --> 8702.327 | Lr --> 0.008 | Seconds_per_step --> 4.317 | [2024-08-09 22:44:33,158][Main][INFO] - [train] Step 10950 out of 80000 | Loss --> 2.581 | Grad_l2 --> 0.595 | Weights_l2 --> 8703.360 | Lr --> 0.008 | Seconds_per_step --> 4.271 | [2024-08-09 22:48:11,589][Main][INFO] - [train] Step 11000 out of 80000 | Loss --> 2.580 | Grad_l2 --> 0.601 | Weights_l2 --> 8704.410 | Lr --> 0.008 | Seconds_per_step --> 4.369 | [2024-08-09 22:51:45,840][Main][INFO] - [train] Step 11050 out of 80000 | Loss --> 2.578 | Grad_l2 --> 0.587 | Weights_l2 --> 8705.448 | Lr --> 0.008 | Seconds_per_step --> 4.285 | [2024-08-09 22:55:25,388][Main][INFO] - [train] Step 11100 out of 80000 | Loss --> 2.574 | Grad_l2 --> 0.599 | Weights_l2 --> 8706.475 | Lr --> 0.008 | Seconds_per_step --> 4.391 | [2024-08-09 22:58:56,339][Main][INFO] - [train] Step 11150 out of 80000 | Loss --> 2.574 | Grad_l2 --> 0.599 | Weights_l2 --> 8707.487 | Lr --> 0.008 | Seconds_per_step --> 4.219 | [2024-08-09 23:02:28,434][Main][INFO] - [train] Step 11200 out of 80000 | Loss --> 2.577 | Grad_l2 --> 0.600 | Weights_l2 --> 8708.529 | Lr --> 0.008 | Seconds_per_step --> 4.242 | [2024-08-09 23:06:01,747][Main][INFO] - [train] Step 11250 out of 80000 | Loss --> 2.563 | Grad_l2 --> 0.582 | Weights_l2 --> 8709.582 | Lr --> 0.008 | Seconds_per_step --> 4.266 | [2024-08-09 23:09:36,821][Main][INFO] - [train] Step 11300 out of 80000 | Loss --> 2.567 | Grad_l2 --> 0.559 | Weights_l2 --> 8710.620 | Lr --> 0.008 | Seconds_per_step --> 4.301 | [2024-08-09 23:13:05,158][Main][INFO] - [train] Step 11350 out of 80000 | Loss --> 2.561 | Grad_l2 --> 0.598 | Weights_l2 --> 8711.669 | Lr --> 0.008 | Seconds_per_step --> 4.167 | [2024-08-09 23:16:34,505][Main][INFO] - [train] Step 11400 out of 80000 | Loss --> 2.555 | Grad_l2 --> 0.588 | Weights_l2 --> 8712.697 | Lr --> 0.008 | Seconds_per_step --> 4.187 | [2024-08-09 23:20:05,626][Main][INFO] - [train] Step 11450 out of 80000 | Loss --> 2.546 | Grad_l2 --> 0.582 | Weights_l2 --> 8713.753 | Lr --> 0.008 | Seconds_per_step --> 4.222 | [2024-08-09 23:23:40,137][Main][INFO] - [train] Step 11500 out of 80000 | Loss --> 2.549 | Grad_l2 --> 0.583 | Weights_l2 --> 8714.804 | Lr --> 0.008 | Seconds_per_step --> 4.290 | [2024-08-09 23:27:11,574][Main][INFO] - [train] Step 11550 out of 80000 | Loss --> 2.536 | Grad_l2 --> 0.582 | Weights_l2 --> 8715.826 | Lr --> 0.008 | Seconds_per_step --> 4.229 | [2024-08-09 23:30:49,636][Main][INFO] - [train] Step 11600 out of 80000 | Loss --> 2.538 | Grad_l2 --> 0.576 | Weights_l2 --> 8716.881 | Lr --> 0.008 | Seconds_per_step --> 4.361 | [2024-08-09 23:34:19,586][Main][INFO] - [train] Step 11650 out of 80000 | Loss --> 2.539 | Grad_l2 --> 0.580 | Weights_l2 --> 8717.926 | Lr --> 0.008 | Seconds_per_step --> 4.199 | [2024-08-09 23:37:49,139][Main][INFO] - [train] Step 11700 out of 80000 | Loss --> 2.524 | Grad_l2 --> 0.585 | Weights_l2 --> 8718.968 | Lr --> 0.008 | Seconds_per_step --> 4.191 | [2024-08-09 23:41:15,748][Main][INFO] - [train] Step 11750 out of 80000 | Loss --> 2.531 | Grad_l2 --> 0.601 | Weights_l2 --> 8720.024 | Lr --> 0.008 | Seconds_per_step --> 4.132 | [2024-08-09 23:44:46,392][Main][INFO] - [train] Step 11800 out of 80000 | Loss --> 2.519 | Grad_l2 --> 0.586 | Weights_l2 --> 8721.064 | Lr --> 0.008 | Seconds_per_step --> 4.213 | [2024-08-09 23:48:21,044][Main][INFO] - [train] Step 11850 out of 80000 | Loss --> 2.516 | Grad_l2 --> 0.576 | Weights_l2 --> 8722.098 | Lr --> 0.008 | Seconds_per_step --> 4.293 | [2024-08-09 23:51:45,233][Main][INFO] - [train] Step 11900 out of 80000 | Loss --> 2.509 | Grad_l2 --> 0.566 | Weights_l2 --> 8723.110 | Lr --> 0.008 | Seconds_per_step --> 4.084 | [2024-08-09 23:55:18,373][Main][INFO] - [train] Step 11950 out of 80000 | Loss --> 2.508 | Grad_l2 --> 0.605 | Weights_l2 --> 8724.151 | Lr --> 0.008 | Seconds_per_step --> 4.263 | [2024-08-09 23:58:51,710][Main][INFO] - [train] Step 12000 out of 80000 | Loss --> 2.510 | Grad_l2 --> 0.587 | Weights_l2 --> 8725.199 | Lr --> 0.008 | Seconds_per_step --> 4.267 | [2024-08-10 00:02:27,062][Main][INFO] - [train] Step 12050 out of 80000 | Loss --> 2.502 | Grad_l2 --> 0.573 | Weights_l2 --> 8726.242 | Lr --> 0.008 | Seconds_per_step --> 4.307 | [2024-08-10 00:05:54,127][Main][INFO] - [train] Step 12100 out of 80000 | Loss --> 2.496 | Grad_l2 --> 0.583 | Weights_l2 --> 8727.250 | Lr --> 0.008 | Seconds_per_step --> 4.141 | [2024-08-10 00:09:20,349][Main][INFO] - [train] Step 12150 out of 80000 | Loss --> 2.499 | Grad_l2 --> 0.553 | Weights_l2 --> 8728.275 | Lr --> 0.008 | Seconds_per_step --> 4.124 | [2024-08-10 00:12:28,941][Main][INFO] - [train] Step 12200 out of 80000 | Loss --> 2.503 | Grad_l2 --> 0.561 | Weights_l2 --> 8729.279 | Lr --> 0.008 | Seconds_per_step --> 3.772 | [2024-08-10 00:15:19,261][Main][INFO] - [train] Step 12250 out of 80000 | Loss --> 2.494 | Grad_l2 --> 0.590 | Weights_l2 --> 8730.313 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 00:18:09,129][Main][INFO] - [train] Step 12300 out of 80000 | Loss --> 2.490 | Grad_l2 --> 0.552 | Weights_l2 --> 8731.341 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 00:20:58,085][Main][INFO] - [train] Step 12350 out of 80000 | Loss --> 2.487 | Grad_l2 --> 0.548 | Weights_l2 --> 8732.401 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 00:23:47,642][Main][INFO] - [train] Step 12400 out of 80000 | Loss --> 2.480 | Grad_l2 --> 0.542 | Weights_l2 --> 8733.439 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 00:26:37,898][Main][INFO] - [train] Step 12450 out of 80000 | Loss --> 2.481 | Grad_l2 --> 0.551 | Weights_l2 --> 8734.469 | Lr --> 0.008 | Seconds_per_step --> 3.405 | [2024-08-10 00:29:27,451][Main][INFO] - [train] Step 12500 out of 80000 | Loss --> 2.477 | Grad_l2 --> 0.558 | Weights_l2 --> 8735.510 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 00:32:17,116][Main][INFO] - [train] Step 12550 out of 80000 | Loss --> 2.478 | Grad_l2 --> 0.549 | Weights_l2 --> 8736.541 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 00:35:06,730][Main][INFO] - [train] Step 12600 out of 80000 | Loss --> 2.470 | Grad_l2 --> 0.545 | Weights_l2 --> 8737.575 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 00:37:58,202][Main][INFO] - [train] Step 12650 out of 80000 | Loss --> 2.471 | Grad_l2 --> 0.547 | Weights_l2 --> 8738.595 | Lr --> 0.008 | Seconds_per_step --> 3.429 | [2024-08-10 00:40:47,794][Main][INFO] - [train] Step 12700 out of 80000 | Loss --> 2.462 | Grad_l2 --> 0.528 | Weights_l2 --> 8739.622 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 00:43:37,447][Main][INFO] - [train] Step 12750 out of 80000 | Loss --> 2.457 | Grad_l2 --> 0.533 | Weights_l2 --> 8740.657 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 00:46:25,836][Main][INFO] - [train] Step 12800 out of 80000 | Loss --> 2.461 | Grad_l2 --> 0.549 | Weights_l2 --> 8741.689 | Lr --> 0.008 | Seconds_per_step --> 3.368 | [2024-08-10 00:49:16,460][Main][INFO] - [train] Step 12850 out of 80000 | Loss --> 2.451 | Grad_l2 --> 0.531 | Weights_l2 --> 8742.743 | Lr --> 0.008 | Seconds_per_step --> 3.412 | [2024-08-10 00:52:06,465][Main][INFO] - [train] Step 12900 out of 80000 | Loss --> 2.453 | Grad_l2 --> 0.527 | Weights_l2 --> 8743.761 | Lr --> 0.008 | Seconds_per_step --> 3.400 | [2024-08-10 00:54:55,879][Main][INFO] - [train] Step 12950 out of 80000 | Loss --> 2.447 | Grad_l2 --> 0.520 | Weights_l2 --> 8744.791 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 00:57:44,034][Main][INFO] - [train] Step 13000 out of 80000 | Loss --> 2.448 | Grad_l2 --> 0.539 | Weights_l2 --> 8745.805 | Lr --> 0.008 | Seconds_per_step --> 3.363 | [2024-08-10 01:00:33,641][Main][INFO] - [train] Step 13050 out of 80000 | Loss --> 2.439 | Grad_l2 --> 0.511 | Weights_l2 --> 8746.858 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 01:03:22,747][Main][INFO] - [train] Step 13100 out of 80000 | Loss --> 2.436 | Grad_l2 --> 0.524 | Weights_l2 --> 8747.888 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 01:06:11,723][Main][INFO] - [train] Step 13150 out of 80000 | Loss --> 2.438 | Grad_l2 --> 0.525 | Weights_l2 --> 8748.918 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 01:09:00,218][Main][INFO] - [train] Step 13200 out of 80000 | Loss --> 2.436 | Grad_l2 --> 0.519 | Weights_l2 --> 8749.966 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 01:11:49,462][Main][INFO] - [train] Step 13250 out of 80000 | Loss --> 2.437 | Grad_l2 --> 0.511 | Weights_l2 --> 8751.004 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 01:14:38,139][Main][INFO] - [train] Step 13300 out of 80000 | Loss --> 2.428 | Grad_l2 --> 0.512 | Weights_l2 --> 8752.040 | Lr --> 0.008 | Seconds_per_step --> 3.374 | [2024-08-10 01:17:26,878][Main][INFO] - [train] Step 13350 out of 80000 | Loss --> 2.426 | Grad_l2 --> 0.514 | Weights_l2 --> 8753.062 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 01:20:15,034][Main][INFO] - [train] Step 13400 out of 80000 | Loss --> 2.429 | Grad_l2 --> 0.509 | Weights_l2 --> 8754.109 | Lr --> 0.008 | Seconds_per_step --> 3.363 | [2024-08-10 01:23:03,363][Main][INFO] - [train] Step 13450 out of 80000 | Loss --> 2.423 | Grad_l2 --> 0.511 | Weights_l2 --> 8755.150 | Lr --> 0.008 | Seconds_per_step --> 3.367 | [2024-08-10 01:25:52,469][Main][INFO] - [train] Step 13500 out of 80000 | Loss --> 2.413 | Grad_l2 --> 0.502 | Weights_l2 --> 8756.209 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 01:28:40,740][Main][INFO] - [train] Step 13550 out of 80000 | Loss --> 2.422 | Grad_l2 --> 0.504 | Weights_l2 --> 8757.222 | Lr --> 0.008 | Seconds_per_step --> 3.365 | [2024-08-10 01:31:29,023][Main][INFO] - [train] Step 13600 out of 80000 | Loss --> 2.415 | Grad_l2 --> 0.495 | Weights_l2 --> 8758.279 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 01:34:18,913][Main][INFO] - [train] Step 13650 out of 80000 | Loss --> 2.420 | Grad_l2 --> 0.505 | Weights_l2 --> 8759.320 | Lr --> 0.008 | Seconds_per_step --> 3.398 | [2024-08-10 01:37:08,507][Main][INFO] - [train] Step 13700 out of 80000 | Loss --> 2.417 | Grad_l2 --> 0.500 | Weights_l2 --> 8760.334 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 01:39:56,547][Main][INFO] - [train] Step 13750 out of 80000 | Loss --> 2.406 | Grad_l2 --> 0.495 | Weights_l2 --> 8761.384 | Lr --> 0.008 | Seconds_per_step --> 3.361 | [2024-08-10 01:42:44,437][Main][INFO] - [train] Step 13800 out of 80000 | Loss --> 2.404 | Grad_l2 --> 0.501 | Weights_l2 --> 8762.410 | Lr --> 0.008 | Seconds_per_step --> 3.358 | [2024-08-10 01:45:33,358][Main][INFO] - [train] Step 13850 out of 80000 | Loss --> 2.397 | Grad_l2 --> 0.502 | Weights_l2 --> 8763.443 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 01:48:22,144][Main][INFO] - [train] Step 13900 out of 80000 | Loss --> 2.389 | Grad_l2 --> 0.492 | Weights_l2 --> 8764.465 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 01:51:09,910][Main][INFO] - [train] Step 13950 out of 80000 | Loss --> 2.391 | Grad_l2 --> 0.502 | Weights_l2 --> 8765.511 | Lr --> 0.008 | Seconds_per_step --> 3.355 | [2024-08-10 01:53:58,677][Main][INFO] - [train] Step 14000 out of 80000 | Loss --> 2.388 | Grad_l2 --> 0.497 | Weights_l2 --> 8766.530 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 01:56:48,499][Main][INFO] - [train] Step 14050 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.491 | Weights_l2 --> 8767.573 | Lr --> 0.008 | Seconds_per_step --> 3.396 | [2024-08-10 01:59:38,047][Main][INFO] - [train] Step 14100 out of 80000 | Loss --> 2.377 | Grad_l2 --> 0.503 | Weights_l2 --> 8768.588 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 02:02:27,734][Main][INFO] - [train] Step 14150 out of 80000 | Loss --> 2.378 | Grad_l2 --> 0.488 | Weights_l2 --> 8769.605 | Lr --> 0.008 | Seconds_per_step --> 3.394 | [2024-08-10 02:05:16,770][Main][INFO] - [train] Step 14200 out of 80000 | Loss --> 2.368 | Grad_l2 --> 0.496 | Weights_l2 --> 8770.616 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 02:08:05,603][Main][INFO] - [train] Step 14250 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.488 | Weights_l2 --> 8771.662 | Lr --> 0.008 | Seconds_per_step --> 3.377 | [2024-08-10 02:10:54,613][Main][INFO] - [train] Step 14300 out of 80000 | Loss --> 2.381 | Grad_l2 --> 0.490 | Weights_l2 --> 8772.676 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 02:13:44,831][Main][INFO] - [train] Step 14350 out of 80000 | Loss --> 2.371 | Grad_l2 --> 0.483 | Weights_l2 --> 8773.704 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 02:16:33,316][Main][INFO] - [train] Step 14400 out of 80000 | Loss --> 2.377 | Grad_l2 --> 0.487 | Weights_l2 --> 8774.735 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 02:19:21,805][Main][INFO] - [train] Step 14450 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.482 | Weights_l2 --> 8775.731 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 02:22:11,796][Main][INFO] - [train] Step 14500 out of 80000 | Loss --> 2.369 | Grad_l2 --> 0.499 | Weights_l2 --> 8776.744 | Lr --> 0.008 | Seconds_per_step --> 3.400 | [2024-08-10 02:25:01,398][Main][INFO] - [train] Step 14550 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.485 | Weights_l2 --> 8777.791 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 02:27:51,146][Main][INFO] - [train] Step 14600 out of 80000 | Loss --> 2.369 | Grad_l2 --> 0.481 | Weights_l2 --> 8778.816 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 02:30:40,279][Main][INFO] - [train] Step 14650 out of 80000 | Loss --> 2.373 | Grad_l2 --> 0.486 | Weights_l2 --> 8779.856 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:33:30,596][Main][INFO] - [train] Step 14700 out of 80000 | Loss --> 2.368 | Grad_l2 --> 0.488 | Weights_l2 --> 8780.880 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 02:36:19,985][Main][INFO] - [train] Step 14750 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.480 | Weights_l2 --> 8781.909 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 02:39:09,113][Main][INFO] - [train] Step 14800 out of 80000 | Loss --> 2.355 | Grad_l2 --> 0.490 | Weights_l2 --> 8782.954 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:41:58,266][Main][INFO] - [train] Step 14850 out of 80000 | Loss --> 2.363 | Grad_l2 --> 0.486 | Weights_l2 --> 8783.980 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 02:44:47,795][Main][INFO] - [train] Step 14900 out of 80000 | Loss --> 2.364 | Grad_l2 --> 0.479 | Weights_l2 --> 8784.994 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 02:47:37,307][Main][INFO] - [train] Step 14950 out of 80000 | Loss --> 2.362 | Grad_l2 --> 0.477 | Weights_l2 --> 8786.006 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 02:50:26,779][Main][INFO] - [train] Step 15000 out of 80000 | Loss --> 2.358 | Grad_l2 --> 0.483 | Weights_l2 --> 8787.026 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 02:50:26,780][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-15000 [2024-08-10 02:50:26,783][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 02:50:28,800][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-15000/model.safetensors [2024-08-10 02:50:31,549][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-15000/optimizer.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-15000/scheduler.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-15000/sampler.bin [2024-08-10 02:50:31,550][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-15000/sampler_1.bin [2024-08-10 02:50:31,551][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-15000/random_states_0.pkl [2024-08-10 02:53:20,359][Main][INFO] - [train] Step 15050 out of 80000 | Loss --> 2.367 | Grad_l2 --> 0.472 | Weights_l2 --> 8788.057 | Lr --> 0.008 | Seconds_per_step --> 3.472 | [2024-08-10 02:56:09,018][Main][INFO] - [train] Step 15100 out of 80000 | Loss --> 2.358 | Grad_l2 --> 0.482 | Weights_l2 --> 8789.093 | Lr --> 0.008 | Seconds_per_step --> 3.373 | [2024-08-10 02:58:59,528][Main][INFO] - [train] Step 15150 out of 80000 | Loss --> 2.357 | Grad_l2 --> 0.474 | Weights_l2 --> 8790.099 | Lr --> 0.008 | Seconds_per_step --> 3.410 | [2024-08-10 03:01:49,372][Main][INFO] - [train] Step 15200 out of 80000 | Loss --> 2.361 | Grad_l2 --> 0.472 | Weights_l2 --> 8791.113 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 03:04:37,075][Main][INFO] - [train] Step 15250 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.478 | Weights_l2 --> 8792.148 | Lr --> 0.008 | Seconds_per_step --> 3.354 | [2024-08-10 03:07:24,386][Main][INFO] - [train] Step 15300 out of 80000 | Loss --> 2.356 | Grad_l2 --> 0.479 | Weights_l2 --> 8793.148 | Lr --> 0.008 | Seconds_per_step --> 3.346 | [2024-08-10 03:10:13,428][Main][INFO] - [train] Step 15350 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.479 | Weights_l2 --> 8794.155 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 03:13:02,393][Main][INFO] - [train] Step 15400 out of 80000 | Loss --> 2.347 | Grad_l2 --> 0.469 | Weights_l2 --> 8795.175 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 03:15:51,829][Main][INFO] - [train] Step 15450 out of 80000 | Loss --> 2.347 | Grad_l2 --> 0.461 | Weights_l2 --> 8796.188 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 03:18:41,239][Main][INFO] - [train] Step 15500 out of 80000 | Loss --> 2.349 | Grad_l2 --> 0.468 | Weights_l2 --> 8797.217 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 03:21:30,852][Main][INFO] - [train] Step 15550 out of 80000 | Loss --> 2.341 | Grad_l2 --> 0.466 | Weights_l2 --> 8798.212 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 03:24:19,122][Main][INFO] - [train] Step 15600 out of 80000 | Loss --> 2.345 | Grad_l2 --> 0.472 | Weights_l2 --> 8799.202 | Lr --> 0.008 | Seconds_per_step --> 3.365 | [2024-08-10 03:27:08,990][Main][INFO] - [train] Step 15650 out of 80000 | Loss --> 2.350 | Grad_l2 --> 0.470 | Weights_l2 --> 8800.214 | Lr --> 0.008 | Seconds_per_step --> 3.397 | [2024-08-10 03:29:58,136][Main][INFO] - [train] Step 15700 out of 80000 | Loss --> 2.338 | Grad_l2 --> 0.473 | Weights_l2 --> 8801.228 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 03:32:47,841][Main][INFO] - [train] Step 15750 out of 80000 | Loss --> 2.335 | Grad_l2 --> 0.456 | Weights_l2 --> 8802.245 | Lr --> 0.008 | Seconds_per_step --> 3.394 | [2024-08-10 03:35:36,029][Main][INFO] - [train] Step 15800 out of 80000 | Loss --> 2.332 | Grad_l2 --> 0.454 | Weights_l2 --> 8803.247 | Lr --> 0.008 | Seconds_per_step --> 3.364 | [2024-08-10 03:38:25,696][Main][INFO] - [train] Step 15850 out of 80000 | Loss --> 2.329 | Grad_l2 --> 0.468 | Weights_l2 --> 8804.255 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 03:41:14,705][Main][INFO] - [train] Step 15900 out of 80000 | Loss --> 2.344 | Grad_l2 --> 0.771 | Weights_l2 --> 8805.210 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:44:05,016][Main][INFO] - [train] Step 15950 out of 80000 | Loss --> 2.336 | Grad_l2 --> 0.468 | Weights_l2 --> 8806.198 | Lr --> 0.008 | Seconds_per_step --> 3.406 | [2024-08-10 03:46:54,039][Main][INFO] - [train] Step 16000 out of 80000 | Loss --> 2.322 | Grad_l2 --> 0.466 | Weights_l2 --> 8807.208 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:49:43,020][Main][INFO] - [train] Step 16050 out of 80000 | Loss --> 2.327 | Grad_l2 --> 0.461 | Weights_l2 --> 8808.179 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 03:52:31,751][Main][INFO] - [train] Step 16100 out of 80000 | Loss --> 2.335 | Grad_l2 --> 0.464 | Weights_l2 --> 8809.180 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 03:55:21,300][Main][INFO] - [train] Step 16150 out of 80000 | Loss --> 2.332 | Grad_l2 --> 0.459 | Weights_l2 --> 8810.175 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 03:58:10,233][Main][INFO] - [train] Step 16200 out of 80000 | Loss --> 2.330 | Grad_l2 --> 0.459 | Weights_l2 --> 8811.153 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 04:00:58,805][Main][INFO] - [train] Step 16250 out of 80000 | Loss --> 2.328 | Grad_l2 --> 0.452 | Weights_l2 --> 8812.148 | Lr --> 0.008 | Seconds_per_step --> 3.371 | [2024-08-10 04:03:47,753][Main][INFO] - [train] Step 16300 out of 80000 | Loss --> 2.327 | Grad_l2 --> 0.457 | Weights_l2 --> 8813.155 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 04:06:37,968][Main][INFO] - [train] Step 16350 out of 80000 | Loss --> 2.315 | Grad_l2 --> 0.457 | Weights_l2 --> 8814.120 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 04:09:27,345][Main][INFO] - [train] Step 16400 out of 80000 | Loss --> 2.323 | Grad_l2 --> 0.451 | Weights_l2 --> 8815.101 | Lr --> 0.008 | Seconds_per_step --> 3.388 | [2024-08-10 04:12:16,639][Main][INFO] - [train] Step 16450 out of 80000 | Loss --> 2.323 | Grad_l2 --> 0.454 | Weights_l2 --> 8816.109 | Lr --> 0.008 | Seconds_per_step --> 3.386 | [2024-08-10 04:15:06,000][Main][INFO] - [train] Step 16500 out of 80000 | Loss --> 2.316 | Grad_l2 --> 0.461 | Weights_l2 --> 8817.094 | Lr --> 0.008 | Seconds_per_step --> 3.387 | [2024-08-10 04:18:00,644][Main][INFO] - [train] Step 16550 out of 80000 | Loss --> 2.316 | Grad_l2 --> 0.454 | Weights_l2 --> 8818.060 | Lr --> 0.008 | Seconds_per_step --> 3.493 | [2024-08-10 04:20:52,878][Main][INFO] - [train] Step 16600 out of 80000 | Loss --> 2.325 | Grad_l2 --> 0.447 | Weights_l2 --> 8819.026 | Lr --> 0.008 | Seconds_per_step --> 3.445 | [2024-08-10 04:23:41,401][Main][INFO] - [train] Step 16650 out of 80000 | Loss --> 2.310 | Grad_l2 --> 0.456 | Weights_l2 --> 8820.003 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 04:26:30,469][Main][INFO] - [train] Step 16700 out of 80000 | Loss --> 2.312 | Grad_l2 --> 0.451 | Weights_l2 --> 8821.005 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 04:29:19,628][Main][INFO] - [train] Step 16750 out of 80000 | Loss --> 2.324 | Grad_l2 --> 0.451 | Weights_l2 --> 8821.988 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 04:32:09,203][Main][INFO] - [train] Step 16800 out of 80000 | Loss --> 2.308 | Grad_l2 --> 0.450 | Weights_l2 --> 8822.952 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 04:35:05,100][Main][INFO] - [train] Step 16850 out of 80000 | Loss --> 2.294 | Grad_l2 --> 0.446 | Weights_l2 --> 8823.904 | Lr --> 0.008 | Seconds_per_step --> 3.518 | [2024-08-10 04:37:58,342][Main][INFO] - [train] Step 16900 out of 80000 | Loss --> 2.310 | Grad_l2 --> 0.454 | Weights_l2 --> 8824.866 | Lr --> 0.008 | Seconds_per_step --> 3.465 | [2024-08-10 04:40:55,063][Main][INFO] - [train] Step 16950 out of 80000 | Loss --> 2.294 | Grad_l2 --> 0.449 | Weights_l2 --> 8825.837 | Lr --> 0.008 | Seconds_per_step --> 3.534 | [2024-08-10 04:44:25,073][Main][INFO] - [train] Step 17000 out of 80000 | Loss --> 2.298 | Grad_l2 --> 0.448 | Weights_l2 --> 8826.792 | Lr --> 0.008 | Seconds_per_step --> 4.200 | [2024-08-10 04:47:13,391][Main][INFO] - [train] Step 17050 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.441 | Weights_l2 --> 8827.769 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 04:50:17,595][Main][INFO] - [train] Step 17100 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.439 | Weights_l2 --> 8828.744 | Lr --> 0.008 | Seconds_per_step --> 3.684 | [2024-08-10 04:53:21,981][Main][INFO] - [train] Step 17150 out of 80000 | Loss --> 2.300 | Grad_l2 --> 0.443 | Weights_l2 --> 8829.696 | Lr --> 0.008 | Seconds_per_step --> 3.688 | [2024-08-10 04:56:15,559][Main][INFO] - [train] Step 17200 out of 80000 | Loss --> 2.301 | Grad_l2 --> 0.447 | Weights_l2 --> 8830.652 | Lr --> 0.008 | Seconds_per_step --> 3.472 | [2024-08-10 04:59:19,644][Main][INFO] - [train] Step 17250 out of 80000 | Loss --> 2.299 | Grad_l2 --> 0.441 | Weights_l2 --> 8831.603 | Lr --> 0.008 | Seconds_per_step --> 3.682 | [2024-08-10 05:03:08,540][Main][INFO] - [train] Step 17300 out of 80000 | Loss --> 2.298 | Grad_l2 --> 0.441 | Weights_l2 --> 8832.566 | Lr --> 0.008 | Seconds_per_step --> 4.578 | [2024-08-10 05:06:12,612][Main][INFO] - [train] Step 17350 out of 80000 | Loss --> 2.292 | Grad_l2 --> 0.442 | Weights_l2 --> 8833.511 | Lr --> 0.008 | Seconds_per_step --> 3.681 | [2024-08-10 05:09:09,744][Main][INFO] - [train] Step 17400 out of 80000 | Loss --> 2.295 | Grad_l2 --> 0.436 | Weights_l2 --> 8834.479 | Lr --> 0.008 | Seconds_per_step --> 3.543 | [2024-08-10 05:12:04,488][Main][INFO] - [train] Step 17450 out of 80000 | Loss --> 2.292 | Grad_l2 --> 0.441 | Weights_l2 --> 8835.436 | Lr --> 0.008 | Seconds_per_step --> 3.495 | [2024-08-10 05:15:05,017][Main][INFO] - [train] Step 17500 out of 80000 | Loss --> 2.293 | Grad_l2 --> 0.446 | Weights_l2 --> 8836.425 | Lr --> 0.008 | Seconds_per_step --> 3.611 | [2024-08-10 05:17:58,854][Main][INFO] - [train] Step 17550 out of 80000 | Loss --> 2.286 | Grad_l2 --> 0.437 | Weights_l2 --> 8837.398 | Lr --> 0.008 | Seconds_per_step --> 3.477 | [2024-08-10 05:21:01,251][Main][INFO] - [train] Step 17600 out of 80000 | Loss --> 2.293 | Grad_l2 --> 0.438 | Weights_l2 --> 8838.359 | Lr --> 0.008 | Seconds_per_step --> 3.648 | [2024-08-10 05:23:50,306][Main][INFO] - [train] Step 17650 out of 80000 | Loss --> 2.290 | Grad_l2 --> 0.440 | Weights_l2 --> 8839.301 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 05:26:39,934][Main][INFO] - [train] Step 17700 out of 80000 | Loss --> 2.279 | Grad_l2 --> 0.437 | Weights_l2 --> 8840.279 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 05:29:31,132][Main][INFO] - [train] Step 17750 out of 80000 | Loss --> 2.295 | Grad_l2 --> 0.435 | Weights_l2 --> 8841.235 | Lr --> 0.008 | Seconds_per_step --> 3.424 | [2024-08-10 05:32:28,592][Main][INFO] - [train] Step 17800 out of 80000 | Loss --> 2.285 | Grad_l2 --> 0.439 | Weights_l2 --> 8842.177 | Lr --> 0.008 | Seconds_per_step --> 3.549 | [2024-08-10 05:35:29,530][Main][INFO] - [train] Step 17850 out of 80000 | Loss --> 2.278 | Grad_l2 --> 0.438 | Weights_l2 --> 8843.147 | Lr --> 0.008 | Seconds_per_step --> 3.619 | [2024-08-10 05:38:26,746][Main][INFO] - [train] Step 17900 out of 80000 | Loss --> 2.280 | Grad_l2 --> 0.433 | Weights_l2 --> 8844.071 | Lr --> 0.008 | Seconds_per_step --> 3.544 | [2024-08-10 05:41:16,386][Main][INFO] - [train] Step 17950 out of 80000 | Loss --> 2.279 | Grad_l2 --> 0.429 | Weights_l2 --> 8845.032 | Lr --> 0.008 | Seconds_per_step --> 3.393 | [2024-08-10 05:44:05,564][Main][INFO] - [train] Step 18000 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.439 | Weights_l2 --> 8845.972 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 05:46:54,634][Main][INFO] - [train] Step 18050 out of 80000 | Loss --> 2.272 | Grad_l2 --> 0.434 | Weights_l2 --> 8846.896 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 05:49:43,407][Main][INFO] - [train] Step 18100 out of 80000 | Loss --> 2.269 | Grad_l2 --> 0.430 | Weights_l2 --> 8847.847 | Lr --> 0.008 | Seconds_per_step --> 3.375 | [2024-08-10 05:52:32,975][Main][INFO] - [train] Step 18150 out of 80000 | Loss --> 2.268 | Grad_l2 --> 0.433 | Weights_l2 --> 8848.785 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 05:55:22,235][Main][INFO] - [train] Step 18200 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.428 | Weights_l2 --> 8849.744 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 05:58:12,437][Main][INFO] - [train] Step 18250 out of 80000 | Loss --> 2.274 | Grad_l2 --> 0.438 | Weights_l2 --> 8850.687 | Lr --> 0.008 | Seconds_per_step --> 3.404 | [2024-08-10 06:01:01,584][Main][INFO] - [train] Step 18300 out of 80000 | Loss --> 2.272 | Grad_l2 --> 0.437 | Weights_l2 --> 8851.617 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 06:03:50,792][Main][INFO] - [train] Step 18350 out of 80000 | Loss --> 2.262 | Grad_l2 --> 0.425 | Weights_l2 --> 8852.557 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 06:06:40,061][Main][INFO] - [train] Step 18400 out of 80000 | Loss --> 2.265 | Grad_l2 --> 0.427 | Weights_l2 --> 8853.478 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 06:09:28,691][Main][INFO] - [train] Step 18450 out of 80000 | Loss --> 2.250 | Grad_l2 --> 0.427 | Weights_l2 --> 8854.413 | Lr --> 0.008 | Seconds_per_step --> 3.373 | [2024-08-10 06:12:17,730][Main][INFO] - [train] Step 18500 out of 80000 | Loss --> 2.258 | Grad_l2 --> 0.432 | Weights_l2 --> 8855.354 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 06:15:06,731][Main][INFO] - [train] Step 18550 out of 80000 | Loss --> 2.264 | Grad_l2 --> 0.428 | Weights_l2 --> 8856.260 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 06:17:55,946][Main][INFO] - [train] Step 18600 out of 80000 | Loss --> 2.257 | Grad_l2 --> 0.429 | Weights_l2 --> 8857.174 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 06:20:45,707][Main][INFO] - [train] Step 18650 out of 80000 | Loss --> 2.251 | Grad_l2 --> 0.427 | Weights_l2 --> 8858.085 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 06:23:34,836][Main][INFO] - [train] Step 18700 out of 80000 | Loss --> 2.262 | Grad_l2 --> 0.427 | Weights_l2 --> 8859.009 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 06:26:23,643][Main][INFO] - [train] Step 18750 out of 80000 | Loss --> 2.253 | Grad_l2 --> 0.421 | Weights_l2 --> 8859.915 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 06:29:11,354][Main][INFO] - [train] Step 18800 out of 80000 | Loss --> 2.250 | Grad_l2 --> 0.424 | Weights_l2 --> 8860.852 | Lr --> 0.008 | Seconds_per_step --> 3.354 | [2024-08-10 06:32:00,884][Main][INFO] - [train] Step 18850 out of 80000 | Loss --> 2.242 | Grad_l2 --> 0.422 | Weights_l2 --> 8861.763 | Lr --> 0.008 | Seconds_per_step --> 3.391 | [2024-08-10 06:34:48,915][Main][INFO] - [train] Step 18900 out of 80000 | Loss --> 2.253 | Grad_l2 --> 0.419 | Weights_l2 --> 8862.654 | Lr --> 0.008 | Seconds_per_step --> 3.361 | [2024-08-10 06:37:37,887][Main][INFO] - [train] Step 18950 out of 80000 | Loss --> 2.241 | Grad_l2 --> 0.420 | Weights_l2 --> 8863.549 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 06:40:26,873][Main][INFO] - [train] Step 19000 out of 80000 | Loss --> 2.256 | Grad_l2 --> 0.419 | Weights_l2 --> 8864.466 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 06:43:16,993][Main][INFO] - [train] Step 19050 out of 80000 | Loss --> 2.244 | Grad_l2 --> 0.415 | Weights_l2 --> 8865.372 | Lr --> 0.008 | Seconds_per_step --> 3.402 | [2024-08-10 06:46:06,020][Main][INFO] - [train] Step 19100 out of 80000 | Loss --> 2.242 | Grad_l2 --> 0.415 | Weights_l2 --> 8866.273 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 06:48:54,927][Main][INFO] - [train] Step 19150 out of 80000 | Loss --> 2.241 | Grad_l2 --> 0.419 | Weights_l2 --> 8867.179 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 06:51:43,818][Main][INFO] - [train] Step 19200 out of 80000 | Loss --> 2.252 | Grad_l2 --> 0.418 | Weights_l2 --> 8868.077 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 06:54:33,553][Main][INFO] - [train] Step 19250 out of 80000 | Loss --> 2.245 | Grad_l2 --> 0.416 | Weights_l2 --> 8868.959 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 06:57:22,776][Main][INFO] - [train] Step 19300 out of 80000 | Loss --> 2.239 | Grad_l2 --> 0.419 | Weights_l2 --> 8869.832 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 07:00:11,595][Main][INFO] - [train] Step 19350 out of 80000 | Loss --> 2.237 | Grad_l2 --> 0.414 | Weights_l2 --> 8870.736 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:03:00,088][Main][INFO] - [train] Step 19400 out of 80000 | Loss --> 2.224 | Grad_l2 --> 0.416 | Weights_l2 --> 8871.632 | Lr --> 0.008 | Seconds_per_step --> 3.370 | [2024-08-10 07:05:49,818][Main][INFO] - [train] Step 19450 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.416 | Weights_l2 --> 8872.502 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 07:08:38,161][Main][INFO] - [train] Step 19500 out of 80000 | Loss --> 2.229 | Grad_l2 --> 0.412 | Weights_l2 --> 8873.395 | Lr --> 0.008 | Seconds_per_step --> 3.367 | [2024-08-10 07:11:26,283][Main][INFO] - [train] Step 19550 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.418 | Weights_l2 --> 8874.282 | Lr --> 0.008 | Seconds_per_step --> 3.362 | [2024-08-10 07:14:14,560][Main][INFO] - [train] Step 19600 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.409 | Weights_l2 --> 8875.190 | Lr --> 0.008 | Seconds_per_step --> 3.366 | [2024-08-10 07:17:03,585][Main][INFO] - [train] Step 19650 out of 80000 | Loss --> 2.224 | Grad_l2 --> 0.408 | Weights_l2 --> 8876.067 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 07:19:52,539][Main][INFO] - [train] Step 19700 out of 80000 | Loss --> 2.223 | Grad_l2 --> 0.411 | Weights_l2 --> 8876.951 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 07:22:41,624][Main][INFO] - [train] Step 19750 out of 80000 | Loss --> 2.204 | Grad_l2 --> 0.407 | Weights_l2 --> 8877.825 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 07:25:30,616][Main][INFO] - [train] Step 19800 out of 80000 | Loss --> 2.228 | Grad_l2 --> 0.412 | Weights_l2 --> 8878.734 | Lr --> 0.008 | Seconds_per_step --> 3.380 | [2024-08-10 07:28:19,413][Main][INFO] - [train] Step 19850 out of 80000 | Loss --> 2.219 | Grad_l2 --> 0.407 | Weights_l2 --> 8879.608 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:31:08,190][Main][INFO] - [train] Step 19900 out of 80000 | Loss --> 2.221 | Grad_l2 --> 0.405 | Weights_l2 --> 8880.499 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:33:58,992][Main][INFO] - [train] Step 19950 out of 80000 | Loss --> 2.209 | Grad_l2 --> 0.411 | Weights_l2 --> 8881.370 | Lr --> 0.008 | Seconds_per_step --> 3.416 | [2024-08-10 07:36:48,090][Main][INFO] - [train] Step 20000 out of 80000 | Loss --> 2.216 | Grad_l2 --> 0.408 | Weights_l2 --> 8882.234 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 07:36:48,091][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 [2024-08-10 07:36:48,094][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 07:36:50,075][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors [2024-08-10 07:36:52,974][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin [2024-08-10 07:36:52,974][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin [2024-08-10 07:36:52,975][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl [2024-08-10 07:39:42,190][Main][INFO] - [train] Step 20050 out of 80000 | Loss --> 2.200 | Grad_l2 --> 0.407 | Weights_l2 --> 8883.086 | Lr --> 0.008 | Seconds_per_step --> 3.482 | [2024-08-10 07:42:32,658][Main][INFO] - [train] Step 20100 out of 80000 | Loss --> 2.198 | Grad_l2 --> 0.404 | Weights_l2 --> 8883.945 | Lr --> 0.008 | Seconds_per_step --> 3.409 | [2024-08-10 07:45:21,533][Main][INFO] - [train] Step 20150 out of 80000 | Loss --> 2.202 | Grad_l2 --> 0.408 | Weights_l2 --> 8884.806 | Lr --> 0.008 | Seconds_per_step --> 3.377 | [2024-08-10 07:48:10,447][Main][INFO] - [train] Step 20200 out of 80000 | Loss --> 2.203 | Grad_l2 --> 0.407 | Weights_l2 --> 8885.699 | Lr --> 0.008 | Seconds_per_step --> 3.378 | [2024-08-10 07:50:58,905][Main][INFO] - [train] Step 20250 out of 80000 | Loss --> 2.196 | Grad_l2 --> 0.404 | Weights_l2 --> 8886.567 | Lr --> 0.008 | Seconds_per_step --> 3.369 | [2024-08-10 07:53:48,181][Main][INFO] - [train] Step 20300 out of 80000 | Loss --> 2.197 | Grad_l2 --> 0.406 | Weights_l2 --> 8887.444 | Lr --> 0.008 | Seconds_per_step --> 3.386 | [2024-08-10 07:56:36,986][Main][INFO] - [train] Step 20350 out of 80000 | Loss --> 2.196 | Grad_l2 --> 0.403 | Weights_l2 --> 8888.293 | Lr --> 0.008 | Seconds_per_step --> 3.376 | [2024-08-10 07:59:25,941][Main][INFO] - [train] Step 20400 out of 80000 | Loss --> 2.193 | Grad_l2 --> 0.406 | Weights_l2 --> 8889.139 | Lr --> 0.008 | Seconds_per_step --> 3.379 | [2024-08-10 08:02:15,456][Main][INFO] - [train] Step 20450 out of 80000 | Loss --> 2.192 | Grad_l2 --> 0.407 | Weights_l2 --> 8889.993 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 08:05:04,967][Main][INFO] - [train] Step 20500 out of 80000 | Loss --> 2.198 | Grad_l2 --> 0.399 | Weights_l2 --> 8890.854 | Lr --> 0.008 | Seconds_per_step --> 3.390 | [2024-08-10 08:07:54,096][Main][INFO] - [train] Step 20550 out of 80000 | Loss --> 2.189 | Grad_l2 --> 0.403 | Weights_l2 --> 8891.715 | Lr --> 0.008 | Seconds_per_step --> 3.383 | [2024-08-10 08:10:43,683][Main][INFO] - [train] Step 20600 out of 80000 | Loss --> 2.193 | Grad_l2 --> 0.398 | Weights_l2 --> 8892.575 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 08:13:31,893][Main][INFO] - [train] Step 20650 out of 80000 | Loss --> 2.186 | Grad_l2 --> 0.403 | Weights_l2 --> 8893.421 | Lr --> 0.008 | Seconds_per_step --> 3.364 | [2024-08-10 08:16:21,983][Main][INFO] - [train] Step 20700 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.399 | Weights_l2 --> 8894.287 | Lr --> 0.008 | Seconds_per_step --> 3.402 | [2024-08-10 08:19:11,096][Main][INFO] - [train] Step 20750 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.397 | Weights_l2 --> 8895.130 | Lr --> 0.008 | Seconds_per_step --> 3.382 | [2024-08-10 08:21:59,564][Main][INFO] - [train] Step 20800 out of 80000 | Loss --> 2.176 | Grad_l2 --> 0.404 | Weights_l2 --> 8895.973 | Lr --> 0.008 | Seconds_per_step --> 3.369 | [2024-08-10 08:24:48,772][Main][INFO] - [train] Step 20850 out of 80000 | Loss --> 2.183 | Grad_l2 --> 0.399 | Weights_l2 --> 8896.827 | Lr --> 0.008 | Seconds_per_step --> 3.384 | [2024-08-10 08:27:39,040][Main][INFO] - [train] Step 20900 out of 80000 | Loss --> 2.181 | Grad_l2 --> 0.398 | Weights_l2 --> 8897.692 | Lr --> 0.008 | Seconds_per_step --> 3.405 | [2024-08-10 08:30:28,088][Main][INFO] - [train] Step 20950 out of 80000 | Loss --> 2.169 | Grad_l2 --> 0.396 | Weights_l2 --> 8898.537 | Lr --> 0.008 | Seconds_per_step --> 3.381 | [2024-08-10 08:33:15,709][Main][INFO] - [train] Step 21000 out of 80000 | Loss --> 2.174 | Grad_l2 --> 0.403 | Weights_l2 --> 8899.403 | Lr --> 0.008 | Seconds_per_step --> 3.352 | [2024-08-10 08:36:05,184][Main][INFO] - [train] Step 21050 out of 80000 | Loss --> 2.173 | Grad_l2 --> 0.395 | Weights_l2 --> 8900.227 | Lr --> 0.008 | Seconds_per_step --> 3.389 | [2024-08-10 08:38:54,935][Main][INFO] - [train] Step 21100 out of 80000 | Loss --> 2.164 | Grad_l2 --> 0.394 | Weights_l2 --> 8901.058 | Lr --> 0.008 | Seconds_per_step --> 3.395 | [2024-08-10 08:41:44,183][Main][INFO] - [train] Step 21150 out of 80000 | Loss --> 2.172 | Grad_l2 --> 0.396 | Weights_l2 --> 8901.899 | Lr --> 0.008 | Seconds_per_step --> 3.385 | [2024-08-10 08:44:34,115][Main][INFO] - [train] Step 21200 out of 80000 | Loss --> 2.164 | Grad_l2 --> 0.399 | Weights_l2 --> 8902.719 | Lr --> 0.008 | Seconds_per_step --> 3.399 | [2024-08-10 08:47:23,707][Main][INFO] - [train] Step 21250 out of 80000 | Loss --> 2.162 | Grad_l2 --> 0.399 | Weights_l2 --> 8903.545 | Lr --> 0.008 | Seconds_per_step --> 3.392 | [2024-08-10 08:50:13,259][Main][INFO] - [train] Step 21300 out of 80000 | Loss --> 2.152 | Grad_l2 --> 0.394 | Weights_l2 --> 8904.388 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 08:53:03,350][Main][INFO] - [train] Step 21350 out of 80000 | Loss --> 2.161 | Grad_l2 --> 0.397 | Weights_l2 --> 8905.227 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 08:55:51,861][Main][INFO] - [train] Step 21400 out of 80000 | Loss --> 2.153 | Grad_l2 --> 0.396 | Weights_l2 --> 8906.073 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 08:58:41,307][Main][INFO] - [train] Step 21450 out of 80000 | Loss --> 2.151 | Grad_l2 --> 0.390 | Weights_l2 --> 8906.885 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:01:31,004][Main][INFO] - [train] Step 21500 out of 80000 | Loss --> 2.152 | Grad_l2 --> 0.390 | Weights_l2 --> 8907.704 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 09:04:20,626][Main][INFO] - [train] Step 21550 out of 80000 | Loss --> 2.140 | Grad_l2 --> 0.392 | Weights_l2 --> 8908.519 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 09:07:09,329][Main][INFO] - [train] Step 21600 out of 80000 | Loss --> 2.142 | Grad_l2 --> 0.392 | Weights_l2 --> 8909.337 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 09:09:58,896][Main][INFO] - [train] Step 21650 out of 80000 | Loss --> 2.142 | Grad_l2 --> 0.396 | Weights_l2 --> 8910.161 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 09:12:47,641][Main][INFO] - [train] Step 21700 out of 80000 | Loss --> 2.138 | Grad_l2 --> 0.393 | Weights_l2 --> 8910.985 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 09:15:36,715][Main][INFO] - [train] Step 21750 out of 80000 | Loss --> 2.137 | Grad_l2 --> 0.389 | Weights_l2 --> 8911.803 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 09:18:25,804][Main][INFO] - [train] Step 21800 out of 80000 | Loss --> 2.124 | Grad_l2 --> 0.388 | Weights_l2 --> 8912.593 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 09:21:14,976][Main][INFO] - [train] Step 21850 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.388 | Weights_l2 --> 8913.424 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 09:24:04,906][Main][INFO] - [train] Step 21900 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.391 | Weights_l2 --> 8914.230 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 09:26:55,122][Main][INFO] - [train] Step 21950 out of 80000 | Loss --> 2.129 | Grad_l2 --> 0.389 | Weights_l2 --> 8915.052 | Lr --> 0.007 | Seconds_per_step --> 3.404 | [2024-08-10 09:29:44,540][Main][INFO] - [train] Step 22000 out of 80000 | Loss --> 2.125 | Grad_l2 --> 0.389 | Weights_l2 --> 8915.853 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 09:32:34,046][Main][INFO] - [train] Step 22050 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.395 | Weights_l2 --> 8916.661 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 09:35:21,952][Main][INFO] - [train] Step 22100 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.385 | Weights_l2 --> 8917.495 | Lr --> 0.007 | Seconds_per_step --> 3.358 | [2024-08-10 09:38:11,003][Main][INFO] - [train] Step 22150 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.391 | Weights_l2 --> 8918.291 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 09:40:58,413][Main][INFO] - [train] Step 22200 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.387 | Weights_l2 --> 8919.096 | Lr --> 0.007 | Seconds_per_step --> 3.348 | [2024-08-10 09:43:47,159][Main][INFO] - [train] Step 22250 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.384 | Weights_l2 --> 8919.898 | Lr --> 0.007 | Seconds_per_step --> 3.375 | [2024-08-10 09:46:36,610][Main][INFO] - [train] Step 22300 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.388 | Weights_l2 --> 8920.716 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:49:27,851][Main][INFO] - [train] Step 22350 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.384 | Weights_l2 --> 8921.523 | Lr --> 0.007 | Seconds_per_step --> 3.425 | [2024-08-10 09:52:17,105][Main][INFO] - [train] Step 22400 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.388 | Weights_l2 --> 8922.301 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 09:55:06,547][Main][INFO] - [train] Step 22450 out of 80000 | Loss --> 2.123 | Grad_l2 --> 0.384 | Weights_l2 --> 8923.100 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 09:57:55,448][Main][INFO] - [train] Step 22500 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.386 | Weights_l2 --> 8923.925 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 10:00:44,643][Main][INFO] - [train] Step 22550 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.383 | Weights_l2 --> 8924.712 | Lr --> 0.007 | Seconds_per_step --> 3.384 | [2024-08-10 10:03:33,602][Main][INFO] - [train] Step 22600 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.386 | Weights_l2 --> 8925.491 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 10:06:22,538][Main][INFO] - [train] Step 22650 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.384 | Weights_l2 --> 8926.277 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 10:09:11,569][Main][INFO] - [train] Step 22700 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.385 | Weights_l2 --> 8927.058 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 10:12:00,266][Main][INFO] - [train] Step 22750 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.386 | Weights_l2 --> 8927.846 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 10:14:49,150][Main][INFO] - [train] Step 22800 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.382 | Weights_l2 --> 8928.630 | Lr --> 0.007 | Seconds_per_step --> 3.378 | [2024-08-10 10:17:37,676][Main][INFO] - [train] Step 22850 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.383 | Weights_l2 --> 8929.421 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 10:20:27,046][Main][INFO] - [train] Step 22900 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.380 | Weights_l2 --> 8930.220 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 10:23:16,675][Main][INFO] - [train] Step 22950 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.383 | Weights_l2 --> 8931.007 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 10:26:05,843][Main][INFO] - [train] Step 23000 out of 80000 | Loss --> 2.120 | Grad_l2 --> 0.381 | Weights_l2 --> 8931.786 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 10:28:54,942][Main][INFO] - [train] Step 23050 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.386 | Weights_l2 --> 8932.580 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 10:31:44,834][Main][INFO] - [train] Step 23100 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.380 | Weights_l2 --> 8933.380 | Lr --> 0.007 | Seconds_per_step --> 3.398 | [2024-08-10 10:34:34,786][Main][INFO] - [train] Step 23150 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.377 | Weights_l2 --> 8934.167 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 10:37:24,223][Main][INFO] - [train] Step 23200 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.377 | Weights_l2 --> 8934.945 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 10:40:13,606][Main][INFO] - [train] Step 23250 out of 80000 | Loss --> 2.109 | Grad_l2 --> 0.381 | Weights_l2 --> 8935.739 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 10:43:03,034][Main][INFO] - [train] Step 23300 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.380 | Weights_l2 --> 8936.510 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 10:45:52,352][Main][INFO] - [train] Step 23350 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.381 | Weights_l2 --> 8937.305 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 10:48:42,504][Main][INFO] - [train] Step 23400 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.381 | Weights_l2 --> 8938.113 | Lr --> 0.007 | Seconds_per_step --> 3.403 | [2024-08-10 10:51:32,165][Main][INFO] - [train] Step 23450 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.375 | Weights_l2 --> 8938.876 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 10:54:21,677][Main][INFO] - [train] Step 23500 out of 80000 | Loss --> 2.100 | Grad_l2 --> 0.380 | Weights_l2 --> 8939.651 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 10:57:10,744][Main][INFO] - [train] Step 23550 out of 80000 | Loss --> 2.101 | Grad_l2 --> 0.381 | Weights_l2 --> 8940.430 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 11:00:00,750][Main][INFO] - [train] Step 23600 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.376 | Weights_l2 --> 8941.201 | Lr --> 0.007 | Seconds_per_step --> 3.400 | [2024-08-10 11:02:49,980][Main][INFO] - [train] Step 23650 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.377 | Weights_l2 --> 8941.992 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 11:05:38,767][Main][INFO] - [train] Step 23700 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.380 | Weights_l2 --> 8942.754 | Lr --> 0.007 | Seconds_per_step --> 3.376 | [2024-08-10 11:08:28,650][Main][INFO] - [train] Step 23750 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.380 | Weights_l2 --> 8943.525 | Lr --> 0.007 | Seconds_per_step --> 3.398 | [2024-08-10 11:11:17,799][Main][INFO] - [train] Step 23800 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.378 | Weights_l2 --> 8944.294 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:14:06,944][Main][INFO] - [train] Step 23850 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.373 | Weights_l2 --> 8945.061 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:16:56,683][Main][INFO] - [train] Step 23900 out of 80000 | Loss --> 2.101 | Grad_l2 --> 0.376 | Weights_l2 --> 8945.835 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 11:19:45,844][Main][INFO] - [train] Step 23950 out of 80000 | Loss --> 2.092 | Grad_l2 --> 0.375 | Weights_l2 --> 8946.629 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:22:35,661][Main][INFO] - [train] Step 24000 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.377 | Weights_l2 --> 8947.382 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 11:25:23,611][Main][INFO] - [train] Step 24050 out of 80000 | Loss --> 2.094 | Grad_l2 --> 0.374 | Weights_l2 --> 8948.130 | Lr --> 0.007 | Seconds_per_step --> 3.359 | [2024-08-10 11:28:12,984][Main][INFO] - [train] Step 24100 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.373 | Weights_l2 --> 8948.867 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 11:31:01,571][Main][INFO] - [train] Step 24150 out of 80000 | Loss --> 2.095 | Grad_l2 --> 0.374 | Weights_l2 --> 8949.631 | Lr --> 0.007 | Seconds_per_step --> 3.372 | [2024-08-10 11:33:50,863][Main][INFO] - [train] Step 24200 out of 80000 | Loss --> 2.097 | Grad_l2 --> 0.376 | Weights_l2 --> 8950.388 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:36:40,686][Main][INFO] - [train] Step 24250 out of 80000 | Loss --> 2.096 | Grad_l2 --> 0.374 | Weights_l2 --> 8951.146 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 11:39:29,849][Main][INFO] - [train] Step 24300 out of 80000 | Loss --> 2.090 | Grad_l2 --> 0.373 | Weights_l2 --> 8951.859 | Lr --> 0.007 | Seconds_per_step --> 3.383 | [2024-08-10 11:42:19,157][Main][INFO] - [train] Step 24350 out of 80000 | Loss --> 2.097 | Grad_l2 --> 0.371 | Weights_l2 --> 8952.607 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:45:08,412][Main][INFO] - [train] Step 24400 out of 80000 | Loss --> 2.094 | Grad_l2 --> 0.372 | Weights_l2 --> 8953.362 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 11:47:57,713][Main][INFO] - [train] Step 24450 out of 80000 | Loss --> 2.091 | Grad_l2 --> 0.375 | Weights_l2 --> 8954.094 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 11:50:46,406][Main][INFO] - [train] Step 24500 out of 80000 | Loss --> 2.100 | Grad_l2 --> 0.369 | Weights_l2 --> 8954.854 | Lr --> 0.007 | Seconds_per_step --> 3.374 | [2024-08-10 11:53:35,339][Main][INFO] - [train] Step 24550 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.374 | Weights_l2 --> 8955.580 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 11:56:24,268][Main][INFO] - [train] Step 24600 out of 80000 | Loss --> 2.104 | Grad_l2 --> 0.375 | Weights_l2 --> 8956.344 | Lr --> 0.007 | Seconds_per_step --> 3.379 | [2024-08-10 11:59:13,863][Main][INFO] - [train] Step 24650 out of 80000 | Loss --> 2.103 | Grad_l2 --> 0.376 | Weights_l2 --> 8957.068 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:02:03,598][Main][INFO] - [train] Step 24700 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.370 | Weights_l2 --> 8957.814 | Lr --> 0.007 | Seconds_per_step --> 3.395 | [2024-08-10 12:04:52,269][Main][INFO] - [train] Step 24750 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.365 | Weights_l2 --> 8958.570 | Lr --> 0.007 | Seconds_per_step --> 3.373 | [2024-08-10 12:07:41,278][Main][INFO] - [train] Step 24800 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.373 | Weights_l2 --> 8959.279 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 12:10:31,555][Main][INFO] - [train] Step 24850 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.369 | Weights_l2 --> 8960.027 | Lr --> 0.007 | Seconds_per_step --> 3.406 | [2024-08-10 12:13:21,204][Main][INFO] - [train] Step 24900 out of 80000 | Loss --> 2.102 | Grad_l2 --> 0.372 | Weights_l2 --> 8960.746 | Lr --> 0.007 | Seconds_per_step --> 3.393 | [2024-08-10 12:16:10,885][Main][INFO] - [train] Step 24950 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.370 | Weights_l2 --> 8961.486 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 12:19:00,451][Main][INFO] - [train] Step 25000 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.372 | Weights_l2 --> 8962.205 | Lr --> 0.007 | Seconds_per_step --> 3.391 | [2024-08-10 12:19:00,451][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-25000 [2024-08-10 12:19:00,454][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-10 12:19:02,584][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-25000/model.safetensors [2024-08-10 12:19:05,471][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-25000/optimizer.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-25000/scheduler.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-25000/sampler.bin [2024-08-10 12:19:05,472][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-25000/sampler_1.bin [2024-08-10 12:19:05,473][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-25000/random_states_0.pkl [2024-08-10 12:21:55,414][Main][INFO] - [train] Step 25050 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.368 | Weights_l2 --> 8962.926 | Lr --> 0.007 | Seconds_per_step --> 3.499 | [2024-08-10 12:24:44,641][Main][INFO] - [train] Step 25100 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.368 | Weights_l2 --> 8963.658 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 12:27:33,678][Main][INFO] - [train] Step 25150 out of 80000 | Loss --> 2.104 | Grad_l2 --> 0.370 | Weights_l2 --> 8964.369 | Lr --> 0.007 | Seconds_per_step --> 3.381 | [2024-08-10 12:30:22,703][Main][INFO] - [train] Step 25200 out of 80000 | Loss --> 2.102 | Grad_l2 --> 0.367 | Weights_l2 --> 8965.077 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 12:33:12,286][Main][INFO] - [train] Step 25250 out of 80000 | Loss --> 2.108 | Grad_l2 --> 0.367 | Weights_l2 --> 8965.794 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:36:00,779][Main][INFO] - [train] Step 25300 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.367 | Weights_l2 --> 8966.528 | Lr --> 0.007 | Seconds_per_step --> 3.370 | [2024-08-10 12:38:48,971][Main][INFO] - [train] Step 25350 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.364 | Weights_l2 --> 8967.235 | Lr --> 0.007 | Seconds_per_step --> 3.364 | [2024-08-10 12:41:37,429][Main][INFO] - [train] Step 25400 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.363 | Weights_l2 --> 8967.925 | Lr --> 0.007 | Seconds_per_step --> 3.369 | [2024-08-10 12:44:26,521][Main][INFO] - [train] Step 25450 out of 80000 | Loss --> 2.110 | Grad_l2 --> 0.371 | Weights_l2 --> 8968.626 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 12:47:15,850][Main][INFO] - [train] Step 25500 out of 80000 | Loss --> 2.113 | Grad_l2 --> 0.368 | Weights_l2 --> 8969.323 | Lr --> 0.007 | Seconds_per_step --> 3.387 | [2024-08-10 12:50:05,229][Main][INFO] - [train] Step 25550 out of 80000 | Loss --> 2.106 | Grad_l2 --> 0.362 | Weights_l2 --> 8970.029 | Lr --> 0.007 | Seconds_per_step --> 3.388 | [2024-08-10 12:52:54,821][Main][INFO] - [train] Step 25600 out of 80000 | Loss --> 2.112 | Grad_l2 --> 0.365 | Weights_l2 --> 8970.711 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 12:55:44,920][Main][INFO] - [train] Step 25650 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.366 | Weights_l2 --> 8971.399 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 12:58:32,938][Main][INFO] - [train] Step 25700 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.364 | Weights_l2 --> 8972.067 | Lr --> 0.007 | Seconds_per_step --> 3.360 | [2024-08-10 13:01:22,907][Main][INFO] - [train] Step 25750 out of 80000 | Loss --> 2.124 | Grad_l2 --> 0.365 | Weights_l2 --> 8972.769 | Lr --> 0.007 | Seconds_per_step --> 3.399 | [2024-08-10 13:04:12,153][Main][INFO] - [train] Step 25800 out of 80000 | Loss --> 2.116 | Grad_l2 --> 0.365 | Weights_l2 --> 8973.450 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:07:02,172][Main][INFO] - [train] Step 25850 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.367 | Weights_l2 --> 8974.127 | Lr --> 0.007 | Seconds_per_step --> 3.400 | [2024-08-10 13:09:51,422][Main][INFO] - [train] Step 25900 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.365 | Weights_l2 --> 8974.808 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:12:40,893][Main][INFO] - [train] Step 25950 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.367 | Weights_l2 --> 8975.499 | Lr --> 0.007 | Seconds_per_step --> 3.389 | [2024-08-10 13:15:30,193][Main][INFO] - [train] Step 26000 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.365 | Weights_l2 --> 8976.191 | Lr --> 0.007 | Seconds_per_step --> 3.386 | [2024-08-10 13:18:19,215][Main][INFO] - [train] Step 26050 out of 80000 | Loss --> 2.105 | Grad_l2 --> 0.366 | Weights_l2 --> 8976.887 | Lr --> 0.007 | Seconds_per_step --> 3.380 | [2024-08-10 13:21:09,000][Main][INFO] - [train] Step 26100 out of 80000 | Loss --> 2.127 | Grad_l2 --> 0.367 | Weights_l2 --> 8977.570 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 13:23:57,054][Main][INFO] - [train] Step 26150 out of 80000 | Loss --> 2.112 | Grad_l2 --> 0.365 | Weights_l2 --> 8978.248 | Lr --> 0.007 | Seconds_per_step --> 3.361 | [2024-08-10 13:26:46,324][Main][INFO] - [train] Step 26200 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.362 | Weights_l2 --> 8978.920 | Lr --> 0.007 | Seconds_per_step --> 3.385 | [2024-08-10 13:29:35,823][Main][INFO] - [train] Step 26250 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.360 | Weights_l2 --> 8979.595 | Lr --> 0.007 | Seconds_per_step --> 3.390 | [2024-08-10 13:32:25,900][Main][INFO] - [train] Step 26300 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.360 | Weights_l2 --> 8980.264 | Lr --> 0.007 | Seconds_per_step --> 3.402 | [2024-08-10 13:35:14,993][Main][INFO] - [train] Step 26350 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.364 | Weights_l2 --> 8980.936 | Lr --> 0.007 | Seconds_per_step --> 3.382 | [2024-08-10 13:38:03,528][Main][INFO] - [train] Step 26400 out of 80000 | Loss --> 2.123 | Grad_l2 --> 0.364 | Weights_l2 --> 8981.629 | Lr --> 0.007 | Seconds_per_step --> 3.371 | [2024-08-10 13:40:55,925][Main][INFO] - [train] Step 26450 out of 80000 | Loss --> 2.111 | Grad_l2 --> 0.363 | Weights_l2 --> 8982.292 | Lr --> 0.007 | Seconds_per_step --> 3.448 | [2024-08-10 13:43:46,071][Main][INFO] - [train] Step 26500 out of 80000 | Loss --> 2.122 | Grad_l2 --> 0.360 | Weights_l2 --> 8982.970 | Lr --> 0.007 | Seconds_per_step --> 3.403 | [2024-08-10 13:46:36,135][Main][INFO] - [train] Step 26550 out of 80000 | Loss --> 2.118 | Grad_l2 --> 0.362 | Weights_l2 --> 8983.635 | Lr --> 0.007 | Seconds_per_step --> 3.401 | [2024-08-10 13:49:25,847][Main][INFO] - [train] Step 26600 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.359 | Weights_l2 --> 8984.271 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 13:52:14,646][Main][INFO] - [train] Step 26650 out of 80000 | Loss --> 2.121 | Grad_l2 --> 0.359 | Weights_l2 --> 8984.935 | Lr --> 0.007 | Seconds_per_step --> 3.376 | [2024-08-10 13:55:04,238][Main][INFO] - [train] Step 26700 out of 80000 | Loss --> 2.119 | Grad_l2 --> 0.361 | Weights_l2 --> 8985.600 | Lr --> 0.007 | Seconds_per_step --> 3.392 | [2024-08-10 13:57:52,429][Main][INFO] - [train] Step 26750 out of 80000 | Loss --> 2.117 | Grad_l2 --> 0.358 | Weights_l2 --> 8986.254 | Lr --> 0.007 | Seconds_per_step --> 3.364 | [2024-08-10 14:00:40,663][Main][INFO] - [train] Step 26800 out of 80000 | Loss --> 2.120 | Grad_l2 --> 0.358 | Weights_l2 --> 8986.901 | Lr --> 0.007 | Seconds_per_step --> 3.365 | [2024-08-10 14:03:30,358][Main][INFO] - [train] Step 26850 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.356 | Weights_l2 --> 8987.561 | Lr --> 0.007 | Seconds_per_step --> 3.394 | [2024-08-10 14:06:20,974][Main][INFO] - [train] Step 26900 out of 80000 | Loss --> 2.107 | Grad_l2 --> 0.358 | Weights_l2 --> 8988.213 | Lr --> 0.007 | Seconds_per_step --> 3.412 | [2024-08-10 14:09:10,794][Main][INFO] - [train] Step 26950 out of 80000 | Loss --> 2.115 | Grad_l2 --> 0.355 | Weights_l2 --> 8988.873 | Lr --> 0.007 | Seconds_per_step --> 3.396 | [2024-08-10 14:11:59,378][Main][INFO] - [train] Step 27000 out of 80000 | Loss --> 2.114 | Grad_l2 --> 0.356 | Weights_l2 --> 8989.515 | Lr --> 0.007 | Seconds_per_step --> 3.372 |