[2025-03-28 17:53:30,457][2713170] Saving configuration to /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...
[2025-03-28 17:53:30,531][2713170] Rollout worker 0 uses device cpu
[2025-03-28 17:53:30,532][2713170] Rollout worker 1 uses device cpu
[2025-03-28 17:53:30,533][2713170] Rollout worker 2 uses device cpu
[2025-03-28 17:53:30,533][2713170] Rollout worker 3 uses device cpu
[2025-03-28 17:53:30,534][2713170] Rollout worker 4 uses device cpu
[2025-03-28 17:53:30,535][2713170] Rollout worker 5 uses device cpu
[2025-03-28 17:53:30,536][2713170] Rollout worker 6 uses device cpu
[2025-03-28 17:53:30,537][2713170] Rollout worker 7 uses device cpu
[2025-03-28 17:53:30,612][2713170] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 17:53:30,613][2713170] InferenceWorker_p0-w0: min num requests: 2
[2025-03-28 17:53:30,653][2713170] Starting all processes...
[2025-03-28 17:53:30,654][2713170] Starting process learner_proc0
[2025-03-28 17:53:31,157][2713170] Starting all processes...
[2025-03-28 17:53:31,169][2713170] Starting process inference_proc0-0
[2025-03-28 17:53:31,171][2713170] Starting process rollout_proc0
[2025-03-28 17:53:31,171][2713170] Starting process rollout_proc1
[2025-03-28 17:53:31,172][2713170] Starting process rollout_proc2
[2025-03-28 17:53:31,172][2713170] Starting process rollout_proc3
[2025-03-28 17:53:31,173][2713170] Starting process rollout_proc4
[2025-03-28 17:53:31,173][2713170] Starting process rollout_proc5
[2025-03-28 17:53:31,173][2713170] Starting process rollout_proc6
[2025-03-28 17:53:31,173][2713170] Starting process rollout_proc7
[2025-03-28 17:53:34,373][2730012] Worker 2 uses CPU cores [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
[2025-03-28 17:53:34,373][2730019] Worker 4 uses CPU cores [128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159]
[2025-03-28 17:53:34,373][2730011] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
[2025-03-28 17:53:34,375][2730010] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 17:53:34,376][2730010] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-03-28 17:53:34,383][2730024] Worker 7 uses CPU cores [224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255]
[2025-03-28 17:53:34,385][2730021] Worker 3 uses CPU cores [96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
[2025-03-28 17:53:34,387][2730020] Worker 5 uses CPU cores [160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191]
[2025-03-28 17:53:34,400][2730023] Worker 6 uses CPU cores [192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223]
[2025-03-28 17:53:34,402][2729989] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 17:53:34,402][2729989] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-03-28 17:53:34,402][2730013] Worker 1 uses CPU cores [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
[2025-03-28 17:53:34,904][2730010] Num visible devices: 1
[2025-03-28 17:53:34,906][2729989] Num visible devices: 1
[2025-03-28 17:53:34,907][2729989] Starting seed is not provided
[2025-03-28 17:53:34,907][2729989] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 17:53:34,908][2729989] Initializing actor-critic model on device cuda:0
[2025-03-28 17:53:34,908][2729989] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 17:53:34,938][2729989] RunningMeanStd input shape: (1,)
[2025-03-28 17:53:34,951][2729989] ConvEncoder: input_channels=3
[2025-03-28 17:53:35,102][2729989] Conv encoder output size: 512
[2025-03-28 17:53:35,102][2729989] Policy head output size: 512
[2025-03-28 17:53:35,116][2729989] Created Actor Critic model with architecture:
[2025-03-28 17:53:35,116][2729989] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-03-28 17:53:35,564][2729989] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-03-28 17:53:37,203][2729989] No checkpoints found
[2025-03-28 17:53:37,204][2729989] Did not load from checkpoint, starting from scratch!
[2025-03-28 17:53:37,204][2729989] Initialized policy 0 weights for model version 0
[2025-03-28 17:53:37,506][2729989] LearnerWorker_p0 finished initialization!
[2025-03-28 17:53:37,506][2729989] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 17:53:37,835][2713170] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-28 17:53:37,878][2730010] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 17:53:37,906][2730010] RunningMeanStd input shape: (1,)
[2025-03-28 17:53:37,916][2730010] ConvEncoder: input_channels=3
[2025-03-28 17:53:38,004][2730010] Conv encoder output size: 512
[2025-03-28 17:53:38,004][2730010] Policy head output size: 512
[2025-03-28 17:53:38,123][2713170] Inference worker 0-0 is ready!
[2025-03-28 17:53:38,124][2713170] All inference workers are ready! Signal rollout workers to start!
[2025-03-28 17:53:38,153][2730019] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,158][2730024] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,159][2730020] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,159][2730013] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,169][2730011] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,175][2730012] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,175][2730021] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,176][2730023] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:53:38,718][2730020] Decorrelating experience for 0 frames...
[2025-03-28 17:53:38,718][2730013] Decorrelating experience for 0 frames...
[2025-03-28 17:53:38,718][2730012] Decorrelating experience for 0 frames...
[2025-03-28 17:53:38,718][2730019] Decorrelating experience for 0 frames...
[2025-03-28 17:53:38,718][2730024] Decorrelating experience for 0 frames...
[2025-03-28 17:53:38,718][2730011] Decorrelating experience for 0 frames...
[2025-03-28 17:53:39,141][2730024] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,153][2730021] Decorrelating experience for 0 frames...
[2025-03-28 17:53:39,165][2730013] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,166][2730020] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,167][2730019] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,169][2730012] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,172][2730023] Decorrelating experience for 0 frames...
[2025-03-28 17:53:39,562][2730021] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,581][2730023] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,582][2730011] Decorrelating experience for 32 frames...
[2025-03-28 17:53:39,597][2730024] Decorrelating experience for 64 frames...
[2025-03-28 17:53:39,624][2730019] Decorrelating experience for 64 frames...
[2025-03-28 17:53:39,635][2730012] Decorrelating experience for 64 frames...
[2025-03-28 17:53:39,641][2730020] Decorrelating experience for 64 frames...
[2025-03-28 17:53:39,641][2730013] Decorrelating experience for 64 frames...
[2025-03-28 17:53:40,038][2730021] Decorrelating experience for 64 frames...
[2025-03-28 17:53:40,052][2730024] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,052][2730011] Decorrelating experience for 64 frames...
[2025-03-28 17:53:40,064][2730012] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,454][2730023] Decorrelating experience for 64 frames...
[2025-03-28 17:53:40,485][2730021] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,486][2730013] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,486][2730011] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,852][2730019] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,913][2730023] Decorrelating experience for 96 frames...
[2025-03-28 17:53:40,921][2730020] Decorrelating experience for 96 frames...
[2025-03-28 17:53:41,244][2729989] Signal inference workers to stop experience collection...
[2025-03-28 17:53:41,258][2730010] InferenceWorker_p0-w0: stopping experience collection
[2025-03-28 17:53:42,835][2713170] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 318.4. Samples: 1592. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-28 17:53:42,837][2713170] Avg episode reward: [(0, '2.205')]
[2025-03-28 17:53:43,658][2729989] Signal inference workers to resume experience collection...
[2025-03-28 17:53:43,659][2730010] InferenceWorker_p0-w0: resuming experience collection
[2025-03-28 17:53:44,998][2730010] Updated weights for policy 0, policy_version 10 (0.0066)
[2025-03-28 17:53:46,399][2730010] Updated weights for policy 0, policy_version 20 (0.0007)
[2025-03-28 17:53:47,835][2713170] Fps is (10 sec: 10240.0, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 102400. Throughput: 0: 2609.0. Samples: 26090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 17:53:47,837][2713170] Avg episode reward: [(0, '4.093')]
[2025-03-28 17:53:47,839][2729989] Saving new best policy, reward=4.093!
[2025-03-28 17:53:49,331][2730010] Updated weights for policy 0, policy_version 30 (0.0007)
[2025-03-28 17:53:50,602][2713170] Heartbeat connected on Batcher_0
[2025-03-28 17:53:50,606][2713170] Heartbeat connected on LearnerWorker_p0
[2025-03-28 17:53:50,614][2713170] Heartbeat connected on InferenceWorker_p0-w0
[2025-03-28 17:53:50,619][2713170] Heartbeat connected on RolloutWorker_w0
[2025-03-28 17:53:50,624][2713170] Heartbeat connected on RolloutWorker_w1
[2025-03-28 17:53:50,629][2713170] Heartbeat connected on RolloutWorker_w2
[2025-03-28 17:53:50,638][2713170] Heartbeat connected on RolloutWorker_w3
[2025-03-28 17:53:50,643][2713170] Heartbeat connected on RolloutWorker_w4
[2025-03-28 17:53:50,644][2713170] Heartbeat connected on RolloutWorker_w5
[2025-03-28 17:53:50,651][2713170] Heartbeat connected on RolloutWorker_w6
[2025-03-28 17:53:50,657][2713170] Heartbeat connected on RolloutWorker_w7
[2025-03-28 17:53:50,841][2730010] Updated weights for policy 0, policy_version 40 (0.0007)
[2025-03-28 17:53:52,230][2730010] Updated weights for policy 0, policy_version 50 (0.0007)
[2025-03-28 17:53:52,835][2713170] Fps is (10 sec: 22118.6, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 221184. Throughput: 0: 2444.8. Samples: 36672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-03-28 17:53:52,837][2713170] Avg episode reward: [(0, '4.486')]
[2025-03-28 17:53:52,843][2729989] Saving new best policy, reward=4.486!
[2025-03-28 17:53:53,668][2730010] Updated weights for policy 0, policy_version 60 (0.0007)
[2025-03-28 17:53:55,180][2730010] Updated weights for policy 0, policy_version 70 (0.0006)
[2025-03-28 17:53:56,687][2730010] Updated weights for policy 0, policy_version 80 (0.0006)
[2025-03-28 17:53:57,835][2713170] Fps is (10 sec: 25395.3, 60 sec: 17817.7, 300 sec: 17817.7). Total num frames: 356352. Throughput: 0: 3934.8. Samples: 78696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 17:53:57,836][2713170] Avg episode reward: [(0, '4.392')]
[2025-03-28 17:53:58,253][2730010] Updated weights for policy 0, policy_version 90 (0.0006)
[2025-03-28 17:53:59,741][2730010] Updated weights for policy 0, policy_version 100 (0.0007)
[2025-03-28 17:54:01,248][2730010] Updated weights for policy 0, policy_version 110 (0.0006)
[2025-03-28 17:54:02,795][2730010] Updated weights for policy 0, policy_version 120 (0.0006)
[2025-03-28 17:54:02,835][2713170] Fps is (10 sec: 27033.9, 60 sec: 19660.9, 300 sec: 19660.9). Total num frames: 491520. Throughput: 0: 4778.9. Samples: 119472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 17:54:02,836][2713170] Avg episode reward: [(0, '4.558')]
[2025-03-28 17:54:02,841][2729989] Saving new best policy, reward=4.558!
[2025-03-28 17:54:04,316][2730010] Updated weights for policy 0, policy_version 130 (0.0006)
[2025-03-28 17:54:05,799][2730010] Updated weights for policy 0, policy_version 140 (0.0006)
[2025-03-28 17:54:07,289][2730010] Updated weights for policy 0, policy_version 150 (0.0006)
[2025-03-28 17:54:07,835][2713170] Fps is (10 sec: 27033.4, 60 sec: 20889.6, 300 sec: 20889.6). Total num frames: 626688. Throughput: 0: 4649.3. Samples: 139478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 17:54:07,837][2713170] Avg episode reward: [(0, '4.413')]
[2025-03-28 17:54:08,812][2730010] Updated weights for policy 0, policy_version 160 (0.0006)
[2025-03-28 17:54:10,255][2730010] Updated weights for policy 0, policy_version 170 (0.0006)
[2025-03-28 17:54:11,700][2730010] Updated weights for policy 0, policy_version 180 (0.0006)
[2025-03-28 17:54:12,835][2713170] Fps is (10 sec: 27442.8, 60 sec: 21884.4, 300 sec: 21884.4). Total num frames: 765952. Throughput: 0: 5168.9. Samples: 180910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:12,843][2713170] Avg episode reward: [(0, '4.549')]
[2025-03-28 17:54:13,151][2730010] Updated weights for policy 0, policy_version 190 (0.0006)
[2025-03-28 17:54:14,627][2730010] Updated weights for policy 0, policy_version 200 (0.0006)
[2025-03-28 17:54:16,062][2730010] Updated weights for policy 0, policy_version 210 (0.0006)
[2025-03-28 17:54:17,334][2730010] Updated weights for policy 0, policy_version 220 (0.0007)
[2025-03-28 17:54:17,835][2713170] Fps is (10 sec: 28672.2, 60 sec: 22835.3, 300 sec: 22835.3). Total num frames: 913408. Throughput: 0: 5615.0. Samples: 224598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:17,837][2713170] Avg episode reward: [(0, '5.481')]
[2025-03-28 17:54:17,839][2729989] Saving new best policy, reward=5.481!
[2025-03-28 17:54:18,695][2730010] Updated weights for policy 0, policy_version 230 (0.0006)
[2025-03-28 17:54:20,104][2730010] Updated weights for policy 0, policy_version 240 (0.0006)
[2025-03-28 17:54:21,638][2730010] Updated weights for policy 0, policy_version 250 (0.0006)
[2025-03-28 17:54:22,835][2713170] Fps is (10 sec: 28671.9, 60 sec: 23392.7, 300 sec: 23392.7). Total num frames: 1052672. Throughput: 0: 5479.4. Samples: 246574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-03-28 17:54:22,837][2713170] Avg episode reward: [(0, '5.019')]
[2025-03-28 17:54:23,212][2730010] Updated weights for policy 0, policy_version 260 (0.0006)
[2025-03-28 17:54:24,681][2730010] Updated weights for policy 0, policy_version 270 (0.0006)
[2025-03-28 17:54:26,137][2730010] Updated weights for policy 0, policy_version 280 (0.0006)
[2025-03-28 17:54:27,609][2730010] Updated weights for policy 0, policy_version 290 (0.0006)
[2025-03-28 17:54:27,835][2713170] Fps is (10 sec: 27852.1, 60 sec: 23838.6, 300 sec: 23838.6). Total num frames: 1191936. Throughput: 0: 6351.0. Samples: 287386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:27,837][2713170] Avg episode reward: [(0, '6.011')]
[2025-03-28 17:54:27,838][2729989] Saving new best policy, reward=6.011!
[2025-03-28 17:54:28,987][2730010] Updated weights for policy 0, policy_version 300 (0.0006)
[2025-03-28 17:54:30,156][2730010] Updated weights for policy 0, policy_version 310 (0.0007)
[2025-03-28 17:54:31,639][2730010] Updated weights for policy 0, policy_version 320 (0.0006)
[2025-03-28 17:54:32,836][2713170] Fps is (10 sec: 29081.4, 60 sec: 24427.0, 300 sec: 24427.0). Total num frames: 1343488. Throughput: 0: 6796.3. Samples: 331926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 17:54:32,839][2713170] Avg episode reward: [(0, '6.494')]
[2025-03-28 17:54:32,845][2729989] Saving new best policy, reward=6.494!
[2025-03-28 17:54:33,131][2730010] Updated weights for policy 0, policy_version 330 (0.0006)
[2025-03-28 17:54:34,679][2730010] Updated weights for policy 0, policy_version 340 (0.0006)
[2025-03-28 17:54:36,210][2730010] Updated weights for policy 0, policy_version 350 (0.0006)
[2025-03-28 17:54:37,735][2730010] Updated weights for policy 0, policy_version 360 (0.0006)
[2025-03-28 17:54:37,835][2713170] Fps is (10 sec: 28262.9, 60 sec: 24576.0, 300 sec: 24576.0). Total num frames: 1474560. Throughput: 0: 7006.5. Samples: 351964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-03-28 17:54:37,837][2713170] Avg episode reward: [(0, '6.363')]
[2025-03-28 17:54:39,290][2730010] Updated weights for policy 0, policy_version 370 (0.0006)
[2025-03-28 17:54:40,834][2730010] Updated weights for policy 0, policy_version 380 (0.0006)
[2025-03-28 17:54:42,394][2730010] Updated weights for policy 0, policy_version 390 (0.0006)
[2025-03-28 17:54:42,835][2713170] Fps is (10 sec: 26214.6, 60 sec: 26760.5, 300 sec: 24702.0). Total num frames: 1605632. Throughput: 0: 6955.8. Samples: 391708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:42,837][2713170] Avg episode reward: [(0, '6.408')]
[2025-03-28 17:54:43,936][2730010] Updated weights for policy 0, policy_version 400 (0.0006)
[2025-03-28 17:54:45,513][2730010] Updated weights for policy 0, policy_version 410 (0.0006)
[2025-03-28 17:54:47,039][2730010] Updated weights for policy 0, policy_version 420 (0.0006)
[2025-03-28 17:54:47,836][2713170] Fps is (10 sec: 26622.9, 60 sec: 27306.5, 300 sec: 24868.4). Total num frames: 1740800. Throughput: 0: 6930.5. Samples: 431348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 17:54:47,840][2713170] Avg episode reward: [(0, '6.476')]
[2025-03-28 17:54:48,561][2730010] Updated weights for policy 0, policy_version 430 (0.0006)
[2025-03-28 17:54:50,114][2730010] Updated weights for policy 0, policy_version 440 (0.0006)
[2025-03-28 17:54:51,659][2730010] Updated weights for policy 0, policy_version 450 (0.0007)
[2025-03-28 17:54:52,835][2713170] Fps is (10 sec: 26624.0, 60 sec: 27511.4, 300 sec: 24958.3). Total num frames: 1871872. Throughput: 0: 6930.8. Samples: 451366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:52,837][2713170] Avg episode reward: [(0, '7.730')]
[2025-03-28 17:54:52,845][2729989] Saving new best policy, reward=7.730!
[2025-03-28 17:54:53,179][2730010] Updated weights for policy 0, policy_version 460 (0.0006)
[2025-03-28 17:54:54,724][2730010] Updated weights for policy 0, policy_version 470 (0.0006)
[2025-03-28 17:54:56,274][2730010] Updated weights for policy 0, policy_version 480 (0.0006)
[2025-03-28 17:54:57,752][2730010] Updated weights for policy 0, policy_version 490 (0.0006)
[2025-03-28 17:54:57,835][2713170] Fps is (10 sec: 26624.9, 60 sec: 27511.4, 300 sec: 25088.0). Total num frames: 2007040. Throughput: 0: 6897.6. Samples: 491302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:54:57,837][2713170] Avg episode reward: [(0, '9.450')]
[2025-03-28 17:54:57,839][2729989] Saving new best policy, reward=9.450!
[2025-03-28 17:54:59,188][2730010] Updated weights for policy 0, policy_version 500 (0.0006)
[2025-03-28 17:55:00,652][2730010] Updated weights for policy 0, policy_version 510 (0.0006)
[2025-03-28 17:55:02,126][2730010] Updated weights for policy 0, policy_version 520 (0.0006)
[2025-03-28 17:55:02,835][2713170] Fps is (10 sec: 27443.3, 60 sec: 27579.7, 300 sec: 25250.6). Total num frames: 2146304. Throughput: 0: 6855.7. Samples: 533104. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 17:55:02,837][2713170] Avg episode reward: [(0, '9.217')]
[2025-03-28 17:55:03,594][2730010] Updated weights for policy 0, policy_version 530 (0.0006)
[2025-03-28 17:55:05,113][2730010] Updated weights for policy 0, policy_version 540 (0.0006)
[2025-03-28 17:55:06,639][2730010] Updated weights for policy 0, policy_version 550 (0.0006)
[2025-03-28 17:55:07,835][2713170] Fps is (10 sec: 27852.9, 60 sec: 27648.0, 300 sec: 25395.2). Total num frames: 2285568. Throughput: 0: 6827.4. Samples: 553806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 17:55:07,836][2713170] Avg episode reward: [(0, '11.260')]
[2025-03-28 17:55:07,838][2729989] Saving new best policy, reward=11.260!
[2025-03-28 17:55:08,092][2730010] Updated weights for policy 0, policy_version 560 (0.0006)
[2025-03-28 17:55:09,608][2730010] Updated weights for policy 0, policy_version 570 (0.0006)
[2025-03-28 17:55:11,074][2730010] Updated weights for policy 0, policy_version 580 (0.0007)
[2025-03-28 17:55:12,569][2730010] Updated weights for policy 0, policy_version 590 (0.0006)
[2025-03-28 17:55:12,835][2713170] Fps is (10 sec: 27443.1, 60 sec: 27579.7, 300 sec: 25481.4). Total num frames: 2420736. Throughput: 0: 6836.8. Samples: 595042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 17:55:12,837][2713170] Avg episode reward: [(0, '11.551')]
[2025-03-28 17:55:12,842][2729989] Saving new best policy, reward=11.551!
[2025-03-28 17:55:14,024][2730010] Updated weights for policy 0, policy_version 600 (0.0006)
[2025-03-28 17:55:15,515][2730010] Updated weights for policy 0, policy_version 610 (0.0006)
[2025-03-28 17:55:16,978][2730010] Updated weights for policy 0, policy_version 620 (0.0006)
[2025-03-28 17:55:17,835][2713170] Fps is (10 sec: 27443.2, 60 sec: 27443.2, 300 sec: 25600.0). Total num frames: 2560000. Throughput: 0: 6770.4. Samples: 636592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:55:17,837][2713170] Avg episode reward: [(0, '11.011')]
[2025-03-28 17:55:18,528][2730010] Updated weights for policy 0, policy_version 630 (0.0006)
[2025-03-28 17:55:20,083][2730010] Updated weights for policy 0, policy_version 640 (0.0006)
[2025-03-28 17:55:21,579][2730010] Updated weights for policy 0, policy_version 650 (0.0006)
[2025-03-28 17:55:22,835][2713170] Fps is (10 sec: 27443.0, 60 sec: 27374.9, 300 sec: 25668.2). Total num frames: 2695168. Throughput: 0: 6764.1. Samples: 656350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 17:55:22,837][2713170] Avg episode reward: [(0, '11.818')]
[2025-03-28 17:55:22,843][2729989] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000659_2699264.pth...
[2025-03-28 17:55:22,944][2729989] Saving new best policy, reward=11.818!
[2025-03-28 17:55:23,061][2730010] Updated weights for policy 0, policy_version 660 (0.0007)
[2025-03-28 17:55:24,335][2730010] Updated weights for policy 0, policy_version 670 (0.0007)
[2025-03-28 17:55:25,720][2730010] Updated weights for policy 0, policy_version 680 (0.0007)
[2025-03-28 17:55:27,211][2730010] Updated weights for policy 0, policy_version 690 (0.0007)
[2025-03-28 17:55:27,835][2713170] Fps is (10 sec: 27852.8, 60 sec: 27443.3, 300 sec: 25804.8). Total num frames: 2838528. Throughput: 0: 6846.9. Samples: 699820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 17:55:27,842][2713170] Avg episode reward: [(0, '11.834')]
[2025-03-28 17:55:27,844][2729989] Saving new best policy, reward=11.834!
[2025-03-28 17:55:28,751][2730010] Updated weights for policy 0, policy_version 700 (0.0007)
[2025-03-28 17:55:30,279][2730010] Updated weights for policy 0, policy_version 710 (0.0007)
[2025-03-28 17:55:31,848][2730010] Updated weights for policy 0, policy_version 720 (0.0007)
[2025-03-28 17:55:32,835][2713170] Fps is (10 sec: 27853.0, 60 sec: 27170.2, 300 sec: 25858.2). Total num frames: 2973696. Throughput: 0: 6854.8. Samples: 739810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 17:55:32,837][2713170] Avg episode reward: [(0, '13.856')]
[2025-03-28 17:55:32,843][2729989] Saving new best policy, reward=13.856!
[2025-03-28 17:55:33,369][2730010] Updated weights for policy 0, policy_version 730 (0.0007)
[2025-03-28 17:55:34,938][2730010] Updated weights for policy 0, policy_version 740 (0.0007)
[2025-03-28 17:55:36,456][2730010] Updated weights for policy 0, policy_version 750 (0.0007)
[2025-03-28 17:55:37,835][2713170] Fps is (10 sec: 26624.0, 60 sec: 27170.1, 300 sec: 25873.1). Total num frames: 3104768. Throughput: 0: 6853.6. Samples: 759780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 17:55:37,837][2713170] Avg episode reward: [(0, '11.840')]
[2025-03-28 17:55:38,032][2730010] Updated weights for policy 0, policy_version 760 (0.0007)
[2025-03-28 17:55:39,566][2730010] Updated weights for policy 0, policy_version 770 (0.0007)
[2025-03-28 17:55:41,000][2730010] Updated weights for policy 0, policy_version 780 (0.0007)
[2025-03-28 17:55:42,482][2730010] Updated weights for policy 0, policy_version 790 (0.0006)
[2025-03-28 17:55:42,835][2713170] Fps is (10 sec: 27033.7, 60 sec: 27306.7, 300 sec: 25952.3). Total num frames: 3244032. Throughput: 0: 6868.9. Samples: 800404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:55:42,836][2713170] Avg episode reward: [(0, '15.923')]
[2025-03-28 17:55:42,843][2729989] Saving new best policy, reward=15.923!
[2025-03-28 17:55:43,948][2730010] Updated weights for policy 0, policy_version 800 (0.0007)
[2025-03-28 17:55:45,425][2730010] Updated weights for policy 0, policy_version 810 (0.0006)
[2025-03-28 17:55:46,967][2730010] Updated weights for policy 0, policy_version 820 (0.0006)
[2025-03-28 17:55:47,835][2713170] Fps is (10 sec: 27443.2, 60 sec: 27306.8, 300 sec: 25993.8). Total num frames: 3379200. Throughput: 0: 6853.2. Samples: 841498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 17:55:47,837][2713170] Avg episode reward: [(0, '17.706')]
[2025-03-28 17:55:47,838][2729989] Saving new best policy, reward=17.706!
[2025-03-28 17:55:48,475][2730010] Updated weights for policy 0, policy_version 830 (0.0006)
[2025-03-28 17:55:50,012][2730010] Updated weights for policy 0, policy_version 840 (0.0006)
[2025-03-28 17:55:51,474][2730010] Updated weights for policy 0, policy_version 850 (0.0006)
[2025-03-28 17:55:52,835][2713170] Fps is (10 sec: 27033.6, 60 sec: 27374.9, 300 sec: 26032.4). Total num frames: 3514368. Throughput: 0: 6842.1. Samples: 861700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-03-28 17:55:52,837][2713170] Avg episode reward: [(0, '17.510')]
[2025-03-28 17:55:53,056][2730010] Updated weights for policy 0, policy_version 860 (0.0006)
[2025-03-28 17:55:54,494][2730010] Updated weights for policy 0, policy_version 870 (0.0006)
[2025-03-28 17:55:55,979][2730010] Updated weights for policy 0, policy_version 880 (0.0006)
[2025-03-28 17:55:57,517][2730010] Updated weights for policy 0, policy_version 890 (0.0006)
[2025-03-28 17:55:57,835][2713170] Fps is (10 sec: 27443.2, 60 sec: 27443.2, 300 sec: 26097.4). Total num frames: 3653632. Throughput: 0: 6839.1. Samples: 902802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 17:55:57,837][2713170] Avg episode reward: [(0, '19.800')]
[2025-03-28 17:55:57,838][2729989] Saving new best policy, reward=19.800!
[2025-03-28 17:55:59,010][2730010] Updated weights for policy 0, policy_version 900 (0.0006)
[2025-03-28 17:56:00,480][2730010] Updated weights for policy 0, policy_version 910 (0.0006)
[2025-03-28 17:56:01,991][2730010] Updated weights for policy 0, policy_version 920 (0.0006)
[2025-03-28 17:56:02,835][2713170] Fps is (10 sec: 27443.1, 60 sec: 27374.9, 300 sec: 26129.6). Total num frames: 3788800. Throughput: 0: 6822.7. Samples: 943614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 17:56:02,837][2713170] Avg episode reward: [(0, '16.817')]
[2025-03-28 17:56:03,498][2730010] Updated weights for policy 0, policy_version 930 (0.0006)
[2025-03-28 17:56:05,032][2730010] Updated weights for policy 0, policy_version 940 (0.0006)
[2025-03-28 17:56:06,454][2730010] Updated weights for policy 0, policy_version 950 (0.0006)
[2025-03-28 17:56:07,835][2713170] Fps is (10 sec: 27443.2, 60 sec: 27374.9, 300 sec: 26187.1). Total num frames: 3928064. Throughput: 0: 6839.6. Samples: 964132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 17:56:07,837][2713170] Avg episode reward: [(0, '20.252')]
[2025-03-28 17:56:07,838][2729989] Saving new best policy, reward=20.252!
[2025-03-28 17:56:07,971][2730010] Updated weights for policy 0, policy_version 960 (0.0006)
[2025-03-28 17:56:09,442][2730010] Updated weights for policy 0, policy_version 970 (0.0006)
[2025-03-28 17:56:10,647][2729989] Stopping Batcher_0...
[2025-03-28 17:56:10,647][2729989] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:56:10,647][2713170] Component Batcher_0 stopped!
[2025-03-28 17:56:10,648][2729989] Loop batcher_evt_loop terminating...
[2025-03-28 17:56:10,680][2730010] Weights refcount: 2 0
[2025-03-28 17:56:10,738][2730013] Stopping RolloutWorker_w1...
[2025-03-28 17:56:10,739][2730013] Loop rollout_proc1_evt_loop terminating...
[2025-03-28 17:56:10,738][2713170] Component RolloutWorker_w1 stopped!
[2025-03-28 17:56:10,741][2730010] Stopping InferenceWorker_p0-w0...
[2025-03-28 17:56:10,741][2730010] Loop inference_proc0-0_evt_loop terminating...
[2025-03-28 17:56:10,743][2729989] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:56:10,742][2713170] Component InferenceWorker_p0-w0 stopped!
[2025-03-28 17:56:10,744][2713170] Component RolloutWorker_w2 stopped!
[2025-03-28 17:56:10,745][2730012] Stopping RolloutWorker_w2...
[2025-03-28 17:56:10,746][2730012] Loop rollout_proc2_evt_loop terminating...
[2025-03-28 17:56:10,746][2730020] Stopping RolloutWorker_w5...
[2025-03-28 17:56:10,747][2730020] Loop rollout_proc5_evt_loop terminating...
[2025-03-28 17:56:10,746][2713170] Component RolloutWorker_w6 stopped!
[2025-03-28 17:56:10,747][2730019] Stopping RolloutWorker_w4...
[2025-03-28 17:56:10,748][2730019] Loop rollout_proc4_evt_loop terminating...
[2025-03-28 17:56:10,746][2730023] Stopping RolloutWorker_w6...
[2025-03-28 17:56:10,749][2730023] Loop rollout_proc6_evt_loop terminating...
[2025-03-28 17:56:10,748][2730024] Stopping RolloutWorker_w7...
[2025-03-28 17:56:10,748][2730021] Stopping RolloutWorker_w3...
[2025-03-28 17:56:10,750][2730024] Loop rollout_proc7_evt_loop terminating...
[2025-03-28 17:56:10,750][2730021] Loop rollout_proc3_evt_loop terminating...
[2025-03-28 17:56:10,748][2713170] Component RolloutWorker_w5 stopped!
[2025-03-28 17:56:10,752][2713170] Component RolloutWorker_w4 stopped!
[2025-03-28 17:56:10,753][2713170] Component RolloutWorker_w3 stopped!
[2025-03-28 17:56:10,753][2713170] Component RolloutWorker_w7 stopped!
[2025-03-28 17:56:10,755][2730011] Stopping RolloutWorker_w0...
[2025-03-28 17:56:10,757][2730011] Loop rollout_proc0_evt_loop terminating...
[2025-03-28 17:56:10,755][2713170] Component RolloutWorker_w0 stopped!
[2025-03-28 17:56:10,818][2729989] Stopping LearnerWorker_p0...
[2025-03-28 17:56:10,818][2729989] Loop learner_proc0_evt_loop terminating...
[2025-03-28 17:56:10,818][2713170] Component LearnerWorker_p0 stopped!
[2025-03-28 17:56:10,820][2713170] Waiting for process learner_proc0 to stop...
[2025-03-28 17:56:11,703][2713170] Waiting for process inference_proc0-0 to join...
[2025-03-28 17:56:11,705][2713170] Waiting for process rollout_proc0 to join...
[2025-03-28 17:56:11,706][2713170] Waiting for process rollout_proc1 to join...
[2025-03-28 17:56:11,708][2713170] Waiting for process rollout_proc2 to join...
[2025-03-28 17:56:11,709][2713170] Waiting for process rollout_proc3 to join...
[2025-03-28 17:56:11,710][2713170] Waiting for process rollout_proc4 to join...
[2025-03-28 17:56:11,711][2713170] Waiting for process rollout_proc5 to join...
[2025-03-28 17:56:11,712][2713170] Waiting for process rollout_proc6 to join...
[2025-03-28 17:56:11,713][2713170] Waiting for process rollout_proc7 to join...
[2025-03-28 17:56:11,715][2713170] Batcher 0 profile tree view:
batching: 14.1429, releasing_batches: 0.0182
[2025-03-28 17:56:11,716][2713170] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
  wait_policy_total: 3.7221
update_model: 2.0099
  weight_update: 0.0006
one_step: 0.0015
  handle_policy_step: 137.5038
    deserialize: 9.3191, stack: 0.7309, obs_to_device_normalize: 29.5231, forward: 65.4827, send_messages: 7.6935
    prepare_outputs: 18.5681
      to_cpu: 11.0246
[2025-03-28 17:56:11,717][2713170] Learner 0 profile tree view:
misc: 0.0044, prepare_batch: 6.8311
train: 15.5300
  epoch_init: 0.0049, minibatch_init: 0.0047, losses_postprocess: 0.2485, kl_divergence: 0.3022, after_optimizer: 2.1743
  calculate_losses: 6.5099
    losses_init: 0.0028, forward_head: 0.5498, bptt_initial: 3.4675, tail: 0.4757, advantages_returns: 0.1261, losses: 0.8851
    bptt: 0.8547
      bptt_forward_core: 0.8105
  update: 5.9615
    clip: 0.6831
[2025-03-28 17:56:11,718][2713170] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1346, enqueue_policy_requests: 6.9249, env_step: 95.6459, overhead: 7.8198, complete_rollouts: 0.2001
save_policy_outputs: 7.4937
  split_output_tensors: 3.5673
[2025-03-28 17:56:11,718][2713170] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1319, enqueue_policy_requests: 7.9354, env_step: 94.6488, overhead: 7.8292, complete_rollouts: 0.2029
save_policy_outputs: 7.6534
  split_output_tensors: 3.6544
[2025-03-28 17:56:11,720][2713170] Loop Runner_EvtLoop terminating...
[2025-03-28 17:56:11,721][2713170] Runner profile tree view:
main_loop: 161.0680
[2025-03-28 17:56:11,721][2713170] Collected {0: 4005888}, FPS: 24870.8
[2025-03-28 17:56:12,155][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 17:56:12,157][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 17:56:12,158][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 17:56:12,159][2713170] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-28 17:56:12,160][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 17:56:12,161][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 17:56:12,162][2713170] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 17:56:12,162][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 17:56:12,163][2713170] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-03-28 17:56:12,164][2713170] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-03-28 17:56:12,165][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 17:56:12,166][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 17:56:12,166][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 17:56:12,167][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 17:56:12,168][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 17:56:12,192][2713170] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 17:56:12,196][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 17:56:12,197][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 17:56:12,210][2713170] ConvEncoder: input_channels=3
[2025-03-28 17:56:12,294][2713170] Conv encoder output size: 512
[2025-03-28 17:56:12,295][2713170] Policy head output size: 512
[2025-03-28 17:56:14,346][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:56:14,350][2713170] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 17:56:14,354][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:56:14,355][2713170] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 17:56:14,356][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:56:14,358][2713170] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 17:59:10,915][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 17:59:10,917][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 17:59:10,918][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 17:59:10,918][2713170] Adding new argument 'save_video'=False that is not in the saved config file!
[2025-03-28 17:59:10,919][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 17:59:10,920][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 17:59:10,920][2713170] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 17:59:10,921][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 17:59:10,922][2713170] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-03-28 17:59:10,923][2713170] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-03-28 17:59:10,923][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 17:59:10,924][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 17:59:10,925][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 17:59:10,926][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 17:59:10,927][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 17:59:10,965][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 17:59:10,966][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 17:59:10,979][2713170] ConvEncoder: input_channels=3
[2025-03-28 17:59:11,015][2713170] Conv encoder output size: 512
[2025-03-28 17:59:11,016][2713170] Policy head output size: 512
[2025-03-28 17:59:11,049][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:59:11,051][2713170] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 17:59:11,052][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:59:11,054][2713170] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 17:59:11,055][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 17:59:11,056][2713170] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:03:15,287][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 18:03:15,289][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 18:03:15,289][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 18:03:15,290][2713170] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-28 18:03:15,291][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:03:15,291][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 18:03:15,292][2713170] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:03:15,293][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 18:03:15,294][2713170] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-03-28 18:03:15,294][2713170] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-03-28 18:03:15,295][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 18:03:15,296][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 18:03:15,296][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 18:03:15,297][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 18:03:15,298][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 18:03:15,317][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:03:15,318][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 18:03:15,326][2713170] ConvEncoder: input_channels=3
[2025-03-28 18:03:15,355][2713170] Conv encoder output size: 512
[2025-03-28 18:03:15,355][2713170] Policy head output size: 512
[2025-03-28 18:03:16,020][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:03:16,022][2713170] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:03:16,024][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:03:16,025][2713170] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:03:16,026][2713170] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:03:16,027][2713170] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:05:00,938][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 18:05:00,939][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 18:05:00,940][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 18:05:00,941][2713170] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-28 18:05:00,941][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:05:00,942][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 18:05:00,943][2713170] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:05:00,943][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 18:05:00,944][2713170] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-03-28 18:05:00,945][2713170] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-03-28 18:05:00,946][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 18:05:00,946][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 18:05:00,947][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 18:05:00,948][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 18:05:00,948][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 18:05:00,966][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:05:00,967][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 18:05:00,976][2713170] ConvEncoder: input_channels=3
[2025-03-28 18:05:01,005][2713170] Conv encoder output size: 512
[2025-03-28 18:05:01,006][2713170] Policy head output size: 512
[2025-03-28 18:05:01,959][2713170] Num frames 100...
[2025-03-28 18:05:02,060][2713170] Num frames 200...
[2025-03-28 18:05:02,157][2713170] Num frames 300...
[2025-03-28 18:05:02,257][2713170] Num frames 400...
[2025-03-28 18:05:02,356][2713170] Num frames 500...
[2025-03-28 18:05:02,448][2713170] Num frames 600...
[2025-03-28 18:05:02,550][2713170] Num frames 700...
[2025-03-28 18:05:02,614][2713170] Avg episode rewards: #0: 14.060, true rewards: #0: 7.060
[2025-03-28 18:05:02,615][2713170] Avg episode reward: 14.060, avg true_objective: 7.060
[2025-03-28 18:05:02,736][2713170] Num frames 800...
[2025-03-28 18:05:02,836][2713170] Num frames 900...
[2025-03-28 18:05:02,931][2713170] Num frames 1000...
[2025-03-28 18:05:03,017][2713170] Num frames 1100...
[2025-03-28 18:05:03,110][2713170] Num frames 1200...
[2025-03-28 18:05:03,206][2713170] Num frames 1300...
[2025-03-28 18:05:03,301][2713170] Num frames 1400...
[2025-03-28 18:05:03,396][2713170] Num frames 1500...
[2025-03-28 18:05:03,485][2713170] Num frames 1600...
[2025-03-28 18:05:03,576][2713170] Avg episode rewards: #0: 19.680, true rewards: #0: 8.180
[2025-03-28 18:05:03,577][2713170] Avg episode reward: 19.680, avg true_objective: 8.180
[2025-03-28 18:05:03,652][2713170] Num frames 1700...
[2025-03-28 18:05:03,755][2713170] Num frames 1800...
[2025-03-28 18:05:03,854][2713170] Num frames 1900...
[2025-03-28 18:05:03,950][2713170] Num frames 2000...
[2025-03-28 18:05:04,041][2713170] Num frames 2100...
[2025-03-28 18:05:04,137][2713170] Num frames 2200...
[2025-03-28 18:05:04,233][2713170] Num frames 2300...
[2025-03-28 18:05:04,331][2713170] Num frames 2400...
[2025-03-28 18:05:04,427][2713170] Num frames 2500...
[2025-03-28 18:05:04,528][2713170] Num frames 2600...
[2025-03-28 18:05:04,628][2713170] Num frames 2700...
[2025-03-28 18:05:04,721][2713170] Num frames 2800...
[2025-03-28 18:05:04,808][2713170] Num frames 2900...
[2025-03-28 18:05:04,893][2713170] Num frames 3000...
[2025-03-28 18:05:04,979][2713170] Num frames 3100...
[2025-03-28 18:05:05,064][2713170] Num frames 3200...
[2025-03-28 18:05:05,152][2713170] Num frames 3300...
[2025-03-28 18:05:05,236][2713170] Num frames 3400...
[2025-03-28 18:05:05,319][2713170] Num frames 3500...
[2025-03-28 18:05:05,403][2713170] Num frames 3600...
[2025-03-28 18:05:05,487][2713170] Num frames 3700...
[2025-03-28 18:05:05,578][2713170] Avg episode rewards: #0: 32.120, true rewards: #0: 12.453
[2025-03-28 18:05:05,579][2713170] Avg episode reward: 32.120, avg true_objective: 12.453
[2025-03-28 18:05:05,632][2713170] Num frames 3800...
[2025-03-28 18:05:05,714][2713170] Num frames 3900...
[2025-03-28 18:05:05,797][2713170] Num frames 4000...
[2025-03-28 18:05:05,881][2713170] Num frames 4100...
[2025-03-28 18:05:05,965][2713170] Num frames 4200...
[2025-03-28 18:05:06,049][2713170] Num frames 4300...
[2025-03-28 18:05:06,131][2713170] Num frames 4400...
[2025-03-28 18:05:06,214][2713170] Num frames 4500...
[2025-03-28 18:05:06,298][2713170] Num frames 4600...
[2025-03-28 18:05:06,380][2713170] Num frames 4700...
[2025-03-28 18:05:06,464][2713170] Num frames 4800...
[2025-03-28 18:05:06,549][2713170] Num frames 4900...
[2025-03-28 18:05:06,624][2713170] Avg episode rewards: #0: 31.050, true rewards: #0: 12.300
[2025-03-28 18:05:06,625][2713170] Avg episode reward: 31.050, avg true_objective: 12.300
[2025-03-28 18:05:06,690][2713170] Num frames 5000...
[2025-03-28 18:05:06,772][2713170] Num frames 5100...
[2025-03-28 18:05:06,854][2713170] Num frames 5200...
[2025-03-28 18:05:06,937][2713170] Num frames 5300...
[2025-03-28 18:05:07,018][2713170] Num frames 5400...
[2025-03-28 18:05:07,098][2713170] Num frames 5500...
[2025-03-28 18:05:07,181][2713170] Num frames 5600...
[2025-03-28 18:05:07,266][2713170] Num frames 5700...
[2025-03-28 18:05:07,365][2713170] Avg episode rewards: #0: 27.904, true rewards: #0: 11.504
[2025-03-28 18:05:07,366][2713170] Avg episode reward: 27.904, avg true_objective: 11.504
[2025-03-28 18:05:07,418][2713170] Num frames 5800...
[2025-03-28 18:05:07,501][2713170] Num frames 5900...
[2025-03-28 18:05:07,580][2713170] Num frames 6000...
[2025-03-28 18:05:07,662][2713170] Num frames 6100...
[2025-03-28 18:05:07,744][2713170] Num frames 6200...
[2025-03-28 18:05:07,826][2713170] Num frames 6300...
[2025-03-28 18:05:07,905][2713170] Avg episode rewards: #0: 25.047, true rewards: #0: 10.547
[2025-03-28 18:05:07,906][2713170] Avg episode reward: 25.047, avg true_objective: 10.547
[2025-03-28 18:05:07,965][2713170] Num frames 6400...
[2025-03-28 18:05:08,048][2713170] Num frames 6500...
[2025-03-28 18:05:08,130][2713170] Num frames 6600...
[2025-03-28 18:05:08,213][2713170] Num frames 6700...
[2025-03-28 18:05:08,295][2713170] Num frames 6800...
[2025-03-28 18:05:08,377][2713170] Num frames 6900...
[2025-03-28 18:05:08,464][2713170] Num frames 7000...
[2025-03-28 18:05:08,515][2713170] Avg episode rewards: #0: 23.571, true rewards: #0: 10.000
[2025-03-28 18:05:08,516][2713170] Avg episode reward: 23.571, avg true_objective: 10.000
[2025-03-28 18:05:08,617][2713170] Num frames 7100...
[2025-03-28 18:05:08,700][2713170] Num frames 7200...
[2025-03-28 18:05:08,783][2713170] Num frames 7300...
[2025-03-28 18:05:08,866][2713170] Num frames 7400...
[2025-03-28 18:05:08,948][2713170] Num frames 7500...
[2025-03-28 18:05:09,030][2713170] Num frames 7600...
[2025-03-28 18:05:09,094][2713170] Avg episode rewards: #0: 22.135, true rewards: #0: 9.510
[2025-03-28 18:05:09,095][2713170] Avg episode reward: 22.135, avg true_objective: 9.510
[2025-03-28 18:05:09,191][2713170] Num frames 7700...
[2025-03-28 18:05:09,270][2713170] Num frames 7800...
[2025-03-28 18:05:09,351][2713170] Num frames 7900...
[2025-03-28 18:05:09,430][2713170] Num frames 8000...
[2025-03-28 18:05:09,509][2713170] Num frames 8100...
[2025-03-28 18:05:09,591][2713170] Num frames 8200...
[2025-03-28 18:05:09,712][2713170] Avg episode rewards: #0: 21.089, true rewards: #0: 9.200
[2025-03-28 18:05:09,712][2713170] Avg episode reward: 21.089, avg true_objective: 9.200
[2025-03-28 18:05:09,730][2713170] Num frames 8300...
[2025-03-28 18:05:09,812][2713170] Num frames 8400...
[2025-03-28 18:05:09,897][2713170] Num frames 8500...
[2025-03-28 18:05:09,978][2713170] Num frames 8600...
[2025-03-28 18:05:10,061][2713170] Num frames 8700...
[2025-03-28 18:05:10,144][2713170] Num frames 8800...
[2025-03-28 18:05:10,227][2713170] Num frames 8900...
[2025-03-28 18:05:10,299][2713170] Avg episode rewards: #0: 20.120, true rewards: #0: 8.920
[2025-03-28 18:05:10,300][2713170] Avg episode reward: 20.120, avg true_objective: 8.920
[2025-03-28 18:05:14,141][2713170] Replay video saved to /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
[2025-03-28 18:06:26,046][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 18:06:26,047][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 18:06:26,048][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 18:06:26,048][2713170] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-28 18:06:26,049][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:06:26,050][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 18:06:26,050][2713170] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-03-28 18:06:26,051][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 18:06:26,052][2713170] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-03-28 18:06:26,053][2713170] Adding new argument 'hf_repository'='stalaei/DeepRL_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-03-28 18:06:26,053][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 18:06:26,054][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 18:06:26,055][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 18:06:26,056][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 18:06:26,057][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 18:06:26,076][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:06:26,077][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 18:06:26,087][2713170] ConvEncoder: input_channels=3
[2025-03-28 18:06:26,120][2713170] Conv encoder output size: 512
[2025-03-28 18:06:26,120][2713170] Policy head output size: 512
[2025-03-28 18:06:26,659][2713170] Num frames 100...
[2025-03-28 18:06:26,742][2713170] Num frames 200...
[2025-03-28 18:06:26,824][2713170] Num frames 300...
[2025-03-28 18:06:26,906][2713170] Num frames 400...
[2025-03-28 18:06:26,989][2713170] Num frames 500...
[2025-03-28 18:06:27,081][2713170] Num frames 600...
[2025-03-28 18:06:27,176][2713170] Num frames 700...
[2025-03-28 18:06:27,276][2713170] Num frames 800...
[2025-03-28 18:06:27,328][2713170] Avg episode rewards: #0: 14.000, true rewards: #0: 8.000
[2025-03-28 18:06:27,329][2713170] Avg episode reward: 14.000, avg true_objective: 8.000
[2025-03-28 18:06:27,455][2713170] Num frames 900...
[2025-03-28 18:06:27,551][2713170] Num frames 1000...
[2025-03-28 18:06:27,646][2713170] Num frames 1100...
[2025-03-28 18:06:27,738][2713170] Num frames 1200...
[2025-03-28 18:06:27,807][2713170] Avg episode rewards: #0: 10.080, true rewards: #0: 6.080
[2025-03-28 18:06:27,808][2713170] Avg episode reward: 10.080, avg true_objective: 6.080
[2025-03-28 18:06:27,915][2713170] Num frames 1300...
[2025-03-28 18:06:28,010][2713170] Num frames 1400...
[2025-03-28 18:06:28,106][2713170] Num frames 1500...
[2025-03-28 18:06:28,203][2713170] Num frames 1600...
[2025-03-28 18:06:28,296][2713170] Num frames 1700...
[2025-03-28 18:06:28,393][2713170] Num frames 1800...
[2025-03-28 18:06:28,487][2713170] Num frames 1900...
[2025-03-28 18:06:28,583][2713170] Num frames 2000...
[2025-03-28 18:06:28,679][2713170] Num frames 2100...
[2025-03-28 18:06:28,781][2713170] Avg episode rewards: #0: 12.827, true rewards: #0: 7.160
[2025-03-28 18:06:28,781][2713170] Avg episode reward: 12.827, avg true_objective: 7.160
[2025-03-28 18:06:28,852][2713170] Num frames 2200...
[2025-03-28 18:06:28,950][2713170] Num frames 2300...
[2025-03-28 18:06:29,048][2713170] Num frames 2400...
[2025-03-28 18:06:29,142][2713170] Num frames 2500...
[2025-03-28 18:06:29,238][2713170] Num frames 2600...
[2025-03-28 18:06:29,323][2713170] Num frames 2700...
[2025-03-28 18:06:29,416][2713170] Num frames 2800...
[2025-03-28 18:06:29,516][2713170] Num frames 2900...
[2025-03-28 18:06:29,612][2713170] Num frames 3000...
[2025-03-28 18:06:29,708][2713170] Num frames 3100...
[2025-03-28 18:06:29,806][2713170] Num frames 3200...
[2025-03-28 18:06:29,945][2713170] Avg episode rewards: #0: 16.683, true rewards: #0: 8.182
[2025-03-28 18:06:29,946][2713170] Avg episode reward: 16.683, avg true_objective: 8.182
[2025-03-28 18:06:29,994][2713170] Num frames 3300...
[2025-03-28 18:06:30,112][2713170] Num frames 3400...
[2025-03-28 18:06:30,208][2713170] Num frames 3500...
[2025-03-28 18:06:30,296][2713170] Num frames 3600...
[2025-03-28 18:06:30,390][2713170] Num frames 3700...
[2025-03-28 18:06:30,497][2713170] Avg episode rewards: #0: 14.906, true rewards: #0: 7.506
[2025-03-28 18:06:30,498][2713170] Avg episode reward: 14.906, avg true_objective: 7.506
[2025-03-28 18:06:30,564][2713170] Num frames 3800...
[2025-03-28 18:06:30,663][2713170] Num frames 3900...
[2025-03-28 18:06:30,761][2713170] Num frames 4000...
[2025-03-28 18:06:30,854][2713170] Num frames 4100...
[2025-03-28 18:06:30,957][2713170] Num frames 4200...
[2025-03-28 18:06:31,055][2713170] Num frames 4300...
[2025-03-28 18:06:31,154][2713170] Num frames 4400...
[2025-03-28 18:06:31,254][2713170] Num frames 4500...
[2025-03-28 18:06:31,341][2713170] Num frames 4600...
[2025-03-28 18:06:31,429][2713170] Num frames 4700...
[2025-03-28 18:06:31,520][2713170] Num frames 4800...
[2025-03-28 18:06:31,615][2713170] Num frames 4900...
[2025-03-28 18:06:31,710][2713170] Num frames 5000...
[2025-03-28 18:06:31,837][2713170] Avg episode rewards: #0: 17.132, true rewards: #0: 8.465
[2025-03-28 18:06:31,837][2713170] Avg episode reward: 17.132, avg true_objective: 8.465
[2025-03-28 18:06:31,858][2713170] Num frames 5100...
[2025-03-28 18:06:31,946][2713170] Num frames 5200...
[2025-03-28 18:06:32,030][2713170] Num frames 5300...
[2025-03-28 18:06:32,115][2713170] Num frames 5400...
[2025-03-28 18:06:32,200][2713170] Num frames 5500...
[2025-03-28 18:06:32,285][2713170] Num frames 5600...
[2025-03-28 18:06:32,382][2713170] Avg episode rewards: #0: 16.073, true rewards: #0: 8.073
[2025-03-28 18:06:32,383][2713170] Avg episode reward: 16.073, avg true_objective: 8.073
[2025-03-28 18:06:32,429][2713170] Num frames 5700...
[2025-03-28 18:06:32,525][2713170] Num frames 5800...
[2025-03-28 18:06:32,622][2713170] Num frames 5900...
[2025-03-28 18:06:32,718][2713170] Num frames 6000...
[2025-03-28 18:06:32,867][2713170] Avg episode rewards: #0: 14.749, true rewards: #0: 7.624
[2025-03-28 18:06:32,868][2713170] Avg episode reward: 14.749, avg true_objective: 7.624
[2025-03-28 18:06:32,870][2713170] Num frames 6100...
[2025-03-28 18:06:32,969][2713170] Num frames 6200...
[2025-03-28 18:06:33,066][2713170] Num frames 6300...
[2025-03-28 18:06:33,162][2713170] Num frames 6400...
[2025-03-28 18:06:33,264][2713170] Num frames 6500...
[2025-03-28 18:06:33,357][2713170] Num frames 6600...
[2025-03-28 18:06:33,454][2713170] Num frames 6700...
[2025-03-28 18:06:33,548][2713170] Num frames 6800...
[2025-03-28 18:06:33,654][2713170] Avg episode rewards: #0: 14.717, true rewards: #0: 7.606
[2025-03-28 18:06:33,655][2713170] Avg episode reward: 14.717, avg true_objective: 7.606
[2025-03-28 18:06:33,735][2713170] Num frames 6900...
[2025-03-28 18:06:33,829][2713170] Num frames 7000...
[2025-03-28 18:06:33,914][2713170] Num frames 7100...
[2025-03-28 18:06:33,998][2713170] Num frames 7200...
[2025-03-28 18:06:34,131][2713170] Avg episode rewards: #0: 14.093, true rewards: #0: 7.293
[2025-03-28 18:06:34,131][2713170] Avg episode reward: 14.093, avg true_objective: 7.293
[2025-03-28 18:06:37,271][2713170] Replay video saved to /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
[2025-03-28 18:06:42,159][2713170] The model has been pushed to https://huggingface.co/stalaei/DeepRL_vizdoom_health_gathering_supreme
[2025-03-28 18:07:55,657][2713170] Environment doom_basic already registered, overwriting...
[2025-03-28 18:07:55,660][2713170] Environment doom_two_colors_easy already registered, overwriting...
[2025-03-28 18:07:55,661][2713170] Environment doom_two_colors_hard already registered, overwriting...
[2025-03-28 18:07:55,662][2713170] Environment doom_dm already registered, overwriting...
[2025-03-28 18:07:55,663][2713170] Environment doom_dwango5 already registered, overwriting...
[2025-03-28 18:07:55,664][2713170] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-03-28 18:07:55,665][2713170] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-03-28 18:07:55,666][2713170] Environment doom_my_way_home already registered, overwriting...
[2025-03-28 18:07:55,667][2713170] Environment doom_deadly_corridor already registered, overwriting...
[2025-03-28 18:07:55,668][2713170] Environment doom_defend_the_center already registered, overwriting...
[2025-03-28 18:07:55,669][2713170] Environment doom_defend_the_line already registered, overwriting...
[2025-03-28 18:07:55,669][2713170] Environment doom_health_gathering already registered, overwriting...
[2025-03-28 18:07:55,670][2713170] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-03-28 18:07:55,671][2713170] Environment doom_battle already registered, overwriting...
[2025-03-28 18:07:55,672][2713170] Environment doom_battle2 already registered, overwriting...
[2025-03-28 18:07:55,672][2713170] Environment doom_duel_bots already registered, overwriting...
[2025-03-28 18:07:55,673][2713170] Environment doom_deathmatch_bots already registered, overwriting...
[2025-03-28 18:07:55,674][2713170] Environment doom_duel already registered, overwriting...
[2025-03-28 18:07:55,675][2713170] Environment doom_deathmatch_full already registered, overwriting...
[2025-03-28 18:07:55,675][2713170] Environment doom_benchmark already registered, overwriting...
[2025-03-28 18:07:55,676][2713170] register_encoder_factory: <function make_vizdoom_encoder at 0x7ff10d919c60>
[2025-03-28 18:07:55,733][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 18:07:55,734][2713170] Overriding arg 'train_for_env_steps' with value 20000000 passed from command line
[2025-03-28 18:07:55,742][2713170] Experiment dir /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists!
[2025-03-28 18:07:55,743][2713170] Resuming existing experiment from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment...
[2025-03-28 18:07:55,744][2713170] Weights and Biases integration disabled
[2025-03-28 18:07:55,749][2713170] Environment var CUDA_VISIBLE_DEVICES is 0,1,2,3,4

[2025-03-28 18:07:59,954][2713170] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=20000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=be21bbf2a4a24818a3c258f91917092a29a01603
git_repo_name=https://github.com/ShayanTalaei/deep-rl-class.git
[2025-03-28 18:07:59,957][2713170] Saving configuration to /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...
[2025-03-28 18:08:00,030][2713170] Rollout worker 0 uses device cpu
[2025-03-28 18:08:00,031][2713170] Rollout worker 1 uses device cpu
[2025-03-28 18:08:00,032][2713170] Rollout worker 2 uses device cpu
[2025-03-28 18:08:00,032][2713170] Rollout worker 3 uses device cpu
[2025-03-28 18:08:00,033][2713170] Rollout worker 4 uses device cpu
[2025-03-28 18:08:00,034][2713170] Rollout worker 5 uses device cpu
[2025-03-28 18:08:00,034][2713170] Rollout worker 6 uses device cpu
[2025-03-28 18:08:00,035][2713170] Rollout worker 7 uses device cpu
[2025-03-28 18:08:00,075][2713170] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 18:08:00,076][2713170] InferenceWorker_p0-w0: min num requests: 2
[2025-03-28 18:08:00,671][2713170] Starting all processes...
[2025-03-28 18:08:00,672][2713170] Starting process learner_proc0
[2025-03-28 18:08:00,746][2713170] Starting all processes...
[2025-03-28 18:08:00,750][2713170] Starting process inference_proc0-0
[2025-03-28 18:08:00,751][2713170] Starting process rollout_proc0
[2025-03-28 18:08:00,751][2713170] Starting process rollout_proc1
[2025-03-28 18:08:00,753][2713170] Starting process rollout_proc2
[2025-03-28 18:08:00,754][2713170] Starting process rollout_proc3
[2025-03-28 18:08:00,756][2713170] Starting process rollout_proc4
[2025-03-28 18:08:00,758][2713170] Starting process rollout_proc5
[2025-03-28 18:08:00,759][2713170] Starting process rollout_proc6
[2025-03-28 18:08:00,763][2713170] Starting process rollout_proc7
[2025-03-28 18:08:03,436][2761553] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 18:08:03,436][2761553] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-03-28 18:08:03,498][2761576] Worker 1 uses CPU cores [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
[2025-03-28 18:08:03,528][2761582] Worker 7 uses CPU cores [224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255]
[2025-03-28 18:08:03,552][2761581] Worker 6 uses CPU cores [192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223]
[2025-03-28 18:08:03,554][2761553] Num visible devices: 1
[2025-03-28 18:08:03,555][2761553] Starting seed is not provided
[2025-03-28 18:08:03,555][2761553] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 18:08:03,556][2761553] Initializing actor-critic model on device cuda:0
[2025-03-28 18:08:03,556][2761553] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:08:03,575][2761578] Worker 4 uses CPU cores [128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159]
[2025-03-28 18:08:03,582][2761553] RunningMeanStd input shape: (1,)
[2025-03-28 18:08:03,583][2761574] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
[2025-03-28 18:08:03,597][2761553] ConvEncoder: input_channels=3
[2025-03-28 18:08:03,618][2761579] Worker 5 uses CPU cores [160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191]
[2025-03-28 18:08:03,629][2761575] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 18:08:03,629][2761575] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-03-28 18:08:03,629][2761580] Worker 3 uses CPU cores [96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
[2025-03-28 18:08:03,679][2761577] Worker 2 uses CPU cores [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
[2025-03-28 18:08:03,688][2761575] Num visible devices: 1
[2025-03-28 18:08:03,695][2761553] Conv encoder output size: 512
[2025-03-28 18:08:03,695][2761553] Policy head output size: 512
[2025-03-28 18:08:03,707][2761553] Created Actor Critic model with architecture:
[2025-03-28 18:08:03,707][2761553] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-03-28 18:08:04,133][2761553] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-03-28 18:08:05,697][2761553] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:08:05,699][2761553] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:08:05,700][2761553] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:08:05,701][2761553] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:08:05,701][2761553] Loading state from checkpoint /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-28 18:08:05,701][2761553] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
  File "/home/stalaei/miniconda3/envs/DeepRL/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-03-28 18:08:05,702][2761553] Did not load from checkpoint, starting from scratch!
[2025-03-28 18:08:05,702][2761553] Initialized policy 0 weights for model version 0
[2025-03-28 18:08:06,198][2761553] LearnerWorker_p0 finished initialization!
[2025-03-28 18:08:06,198][2761553] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-28 18:08:06,532][2761575] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:08:06,539][2761575] RunningMeanStd input shape: (1,)
[2025-03-28 18:08:06,548][2761575] ConvEncoder: input_channels=3
[2025-03-28 18:08:06,623][2761575] Conv encoder output size: 512
[2025-03-28 18:08:06,623][2761575] Policy head output size: 512
[2025-03-28 18:08:06,733][2713170] Inference worker 0-0 is ready!
[2025-03-28 18:08:06,734][2713170] All inference workers are ready! Signal rollout workers to start!
[2025-03-28 18:08:06,770][2761582] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,770][2761580] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,781][2761577] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,791][2761581] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,799][2761574] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,799][2761578] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,801][2761579] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:06,809][2761576] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-28 18:08:07,201][2761580] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,217][2761582] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,247][2761574] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,248][2761578] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,248][2761576] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,250][2761577] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,661][2761576] Decorrelating experience for 32 frames...
[2025-03-28 18:08:07,662][2761578] Decorrelating experience for 32 frames...
[2025-03-28 18:08:07,663][2761581] Decorrelating experience for 0 frames...
[2025-03-28 18:08:07,676][2761580] Decorrelating experience for 32 frames...
[2025-03-28 18:08:07,676][2761579] Decorrelating experience for 0 frames...
[2025-03-28 18:08:08,085][2761579] Decorrelating experience for 32 frames...
[2025-03-28 18:08:08,098][2761581] Decorrelating experience for 32 frames...
[2025-03-28 18:08:08,107][2761574] Decorrelating experience for 32 frames...
[2025-03-28 18:08:08,142][2761577] Decorrelating experience for 32 frames...
[2025-03-28 18:08:08,147][2761576] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,148][2761582] Decorrelating experience for 32 frames...
[2025-03-28 18:08:08,153][2761580] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,475][2761578] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,548][2761579] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,561][2761574] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,562][2761576] Decorrelating experience for 96 frames...
[2025-03-28 18:08:08,577][2761581] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,876][2761578] Decorrelating experience for 96 frames...
[2025-03-28 18:08:08,984][2761579] Decorrelating experience for 96 frames...
[2025-03-28 18:08:08,998][2761582] Decorrelating experience for 64 frames...
[2025-03-28 18:08:08,999][2761580] Decorrelating experience for 96 frames...
[2025-03-28 18:08:09,004][2761581] Decorrelating experience for 96 frames...
[2025-03-28 18:08:09,428][2761574] Decorrelating experience for 96 frames...
[2025-03-28 18:08:09,429][2761577] Decorrelating experience for 64 frames...
[2025-03-28 18:08:09,434][2761582] Decorrelating experience for 96 frames...
[2025-03-28 18:08:09,835][2761553] Signal inference workers to stop experience collection...
[2025-03-28 18:08:09,838][2761575] InferenceWorker_p0-w0: stopping experience collection
[2025-03-28 18:08:09,851][2761577] Decorrelating experience for 96 frames...
[2025-03-28 18:08:10,749][2713170] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 1940. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-28 18:08:10,750][2713170] Avg episode reward: [(0, '2.092')]
[2025-03-28 18:08:12,053][2761553] Signal inference workers to resume experience collection...
[2025-03-28 18:08:12,054][2761575] InferenceWorker_p0-w0: resuming experience collection
[2025-03-28 18:08:13,204][2761575] Updated weights for policy 0, policy_version 10 (0.0062)
[2025-03-28 18:08:14,532][2761575] Updated weights for policy 0, policy_version 20 (0.0006)
[2025-03-28 18:08:15,749][2713170] Fps is (10 sec: 22118.3, 60 sec: 22118.3, 300 sec: 22118.3). Total num frames: 110592. Throughput: 0: 1034.4. Samples: 7112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:08:15,751][2713170] Avg episode reward: [(0, '4.391')]
[2025-03-28 18:08:15,756][2761553] Saving new best policy, reward=4.391!
[2025-03-28 18:08:16,436][2761575] Updated weights for policy 0, policy_version 30 (0.0006)
[2025-03-28 18:08:17,836][2761575] Updated weights for policy 0, policy_version 40 (0.0006)
[2025-03-28 18:08:19,264][2761575] Updated weights for policy 0, policy_version 50 (0.0006)
[2025-03-28 18:08:20,062][2713170] Heartbeat connected on Batcher_0
[2025-03-28 18:08:20,072][2713170] Heartbeat connected on LearnerWorker_p0
[2025-03-28 18:08:20,076][2713170] Heartbeat connected on InferenceWorker_p0-w0
[2025-03-28 18:08:20,088][2713170] Heartbeat connected on RolloutWorker_w1
[2025-03-28 18:08:20,089][2713170] Heartbeat connected on RolloutWorker_w0
[2025-03-28 18:08:20,093][2713170] Heartbeat connected on RolloutWorker_w2
[2025-03-28 18:08:20,098][2713170] Heartbeat connected on RolloutWorker_w3
[2025-03-28 18:08:20,100][2713170] Heartbeat connected on RolloutWorker_w4
[2025-03-28 18:08:20,664][2713170] Heartbeat connected on RolloutWorker_w5
[2025-03-28 18:08:20,669][2713170] Heartbeat connected on RolloutWorker_w6
[2025-03-28 18:08:20,680][2713170] Heartbeat connected on RolloutWorker_w7
[2025-03-28 18:08:20,719][2761575] Updated weights for policy 0, policy_version 60 (0.0006)
[2025-03-28 18:08:20,749][2713170] Fps is (10 sec: 24576.4, 60 sec: 24576.4, 300 sec: 24576.4). Total num frames: 245760. Throughput: 0: 4571.7. Samples: 47656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:08:20,750][2713170] Avg episode reward: [(0, '4.609')]
[2025-03-28 18:08:20,751][2761553] Saving new best policy, reward=4.609!
[2025-03-28 18:08:22,242][2761575] Updated weights for policy 0, policy_version 70 (0.0006)
[2025-03-28 18:08:23,738][2761575] Updated weights for policy 0, policy_version 80 (0.0006)
[2025-03-28 18:08:25,198][2761575] Updated weights for policy 0, policy_version 90 (0.0007)
[2025-03-28 18:08:25,749][2713170] Fps is (10 sec: 27033.7, 60 sec: 25395.2, 300 sec: 25395.2). Total num frames: 380928. Throughput: 0: 5858.5. Samples: 89818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:08:25,751][2713170] Avg episode reward: [(0, '4.515')]
[2025-03-28 18:08:26,774][2761575] Updated weights for policy 0, policy_version 100 (0.0006)
[2025-03-28 18:08:28,294][2761575] Updated weights for policy 0, policy_version 110 (0.0006)
[2025-03-28 18:08:29,875][2761575] Updated weights for policy 0, policy_version 120 (0.0006)
[2025-03-28 18:08:30,749][2713170] Fps is (10 sec: 25804.7, 60 sec: 25190.6, 300 sec: 25190.6). Total num frames: 503808. Throughput: 0: 6265.3. Samples: 127246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:08:30,750][2713170] Avg episode reward: [(0, '4.567')]
[2025-03-28 18:08:31,694][2761575] Updated weights for policy 0, policy_version 130 (0.0006)
[2025-03-28 18:08:33,141][2761575] Updated weights for policy 0, policy_version 140 (0.0006)
[2025-03-28 18:08:34,722][2761575] Updated weights for policy 0, policy_version 150 (0.0006)
[2025-03-28 18:08:35,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 25722.9, 300 sec: 25722.9). Total num frames: 643072. Throughput: 0: 5833.7. Samples: 147782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:08:35,750][2713170] Avg episode reward: [(0, '4.442')]
[2025-03-28 18:08:36,223][2761575] Updated weights for policy 0, policy_version 160 (0.0006)
[2025-03-28 18:08:37,796][2761575] Updated weights for policy 0, policy_version 170 (0.0006)
[2025-03-28 18:08:39,312][2761575] Updated weights for policy 0, policy_version 180 (0.0006)
[2025-03-28 18:08:40,749][2713170] Fps is (10 sec: 27033.4, 60 sec: 25804.9, 300 sec: 25804.9). Total num frames: 774144. Throughput: 0: 6202.1. Samples: 188002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:08:40,750][2713170] Avg episode reward: [(0, '4.381')]
[2025-03-28 18:08:40,807][2761575] Updated weights for policy 0, policy_version 190 (0.0007)
[2025-03-28 18:08:42,363][2761575] Updated weights for policy 0, policy_version 200 (0.0007)
[2025-03-28 18:08:43,838][2761575] Updated weights for policy 0, policy_version 210 (0.0006)
[2025-03-28 18:08:45,376][2761575] Updated weights for policy 0, policy_version 220 (0.0007)
[2025-03-28 18:08:45,749][2713170] Fps is (10 sec: 26624.4, 60 sec: 25980.5, 300 sec: 25980.5). Total num frames: 909312. Throughput: 0: 6468.4. Samples: 228332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:08:45,750][2713170] Avg episode reward: [(0, '4.580')]
[2025-03-28 18:08:46,927][2761575] Updated weights for policy 0, policy_version 230 (0.0006)
[2025-03-28 18:08:48,481][2761575] Updated weights for policy 0, policy_version 240 (0.0006)
[2025-03-28 18:08:50,030][2761575] Updated weights for policy 0, policy_version 250 (0.0006)
[2025-03-28 18:08:50,749][2713170] Fps is (10 sec: 26624.2, 60 sec: 26009.7, 300 sec: 26009.7). Total num frames: 1040384. Throughput: 0: 6154.3. Samples: 248110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:08:50,751][2713170] Avg episode reward: [(0, '4.720')]
[2025-03-28 18:08:50,752][2761553] Saving new best policy, reward=4.720!
[2025-03-28 18:08:51,570][2761575] Updated weights for policy 0, policy_version 260 (0.0007)
[2025-03-28 18:08:53,044][2761575] Updated weights for policy 0, policy_version 270 (0.0006)
[2025-03-28 18:08:54,522][2761575] Updated weights for policy 0, policy_version 280 (0.0006)
[2025-03-28 18:08:55,749][2713170] Fps is (10 sec: 26623.6, 60 sec: 26123.4, 300 sec: 26123.4). Total num frames: 1175552. Throughput: 0: 6373.9. Samples: 288764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:08:55,750][2713170] Avg episode reward: [(0, '4.768')]
[2025-03-28 18:08:55,769][2761553] Saving new best policy, reward=4.768!
[2025-03-28 18:08:56,096][2761575] Updated weights for policy 0, policy_version 290 (0.0007)
[2025-03-28 18:08:57,583][2761575] Updated weights for policy 0, policy_version 300 (0.0006)
[2025-03-28 18:08:59,084][2761575] Updated weights for policy 0, policy_version 310 (0.0006)
[2025-03-28 18:09:00,749][2713170] Fps is (10 sec: 26623.9, 60 sec: 26132.6, 300 sec: 26132.6). Total num frames: 1306624. Throughput: 0: 7108.7. Samples: 327002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 18:09:00,750][2713170] Avg episode reward: [(0, '4.578')]
[2025-03-28 18:09:00,892][2761575] Updated weights for policy 0, policy_version 320 (0.0008)
[2025-03-28 18:09:02,440][2761575] Updated weights for policy 0, policy_version 330 (0.0006)
[2025-03-28 18:09:03,890][2761575] Updated weights for policy 0, policy_version 340 (0.0007)
[2025-03-28 18:09:05,437][2761575] Updated weights for policy 0, policy_version 350 (0.0006)
[2025-03-28 18:09:05,749][2713170] Fps is (10 sec: 26624.2, 60 sec: 26214.5, 300 sec: 26214.5). Total num frames: 1441792. Throughput: 0: 6665.3. Samples: 347596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:09:05,750][2713170] Avg episode reward: [(0, '4.792')]
[2025-03-28 18:09:05,754][2761553] Saving new best policy, reward=4.792!
[2025-03-28 18:09:07,011][2761575] Updated weights for policy 0, policy_version 360 (0.0006)
[2025-03-28 18:09:08,543][2761575] Updated weights for policy 0, policy_version 370 (0.0006)
[2025-03-28 18:09:10,066][2761575] Updated weights for policy 0, policy_version 380 (0.0006)
[2025-03-28 18:09:10,749][2713170] Fps is (10 sec: 25394.9, 60 sec: 26009.6, 300 sec: 26009.6). Total num frames: 1560576. Throughput: 0: 6614.2. Samples: 387456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:09:10,751][2713170] Avg episode reward: [(0, '5.040')]
[2025-03-28 18:09:10,752][2761553] Saving new best policy, reward=5.040!
[2025-03-28 18:09:12,754][2761575] Updated weights for policy 0, policy_version 390 (0.0006)
[2025-03-28 18:09:14,116][2761575] Updated weights for policy 0, policy_version 400 (0.0006)
[2025-03-28 18:09:15,665][2761575] Updated weights for policy 0, policy_version 410 (0.0006)
[2025-03-28 18:09:15,749][2713170] Fps is (10 sec: 23756.9, 60 sec: 26146.2, 300 sec: 25836.4). Total num frames: 1679360. Throughput: 0: 6527.7. Samples: 420992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:09:15,750][2713170] Avg episode reward: [(0, '4.568')]
[2025-03-28 18:09:17,179][2761575] Updated weights for policy 0, policy_version 420 (0.0006)
[2025-03-28 18:09:18,690][2761575] Updated weights for policy 0, policy_version 430 (0.0006)
[2025-03-28 18:09:20,165][2761575] Updated weights for policy 0, policy_version 440 (0.0006)
[2025-03-28 18:09:20,749][2713170] Fps is (10 sec: 25395.6, 60 sec: 26146.1, 300 sec: 25921.9). Total num frames: 1814528. Throughput: 0: 6523.8. Samples: 441350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:09:20,750][2713170] Avg episode reward: [(0, '4.834')]
[2025-03-28 18:09:21,640][2761575] Updated weights for policy 0, policy_version 450 (0.0007)
[2025-03-28 18:09:23,138][2761575] Updated weights for policy 0, policy_version 460 (0.0006)
[2025-03-28 18:09:24,567][2761575] Updated weights for policy 0, policy_version 470 (0.0006)
[2025-03-28 18:09:25,749][2713170] Fps is (10 sec: 27852.7, 60 sec: 26282.7, 300 sec: 26105.2). Total num frames: 1957888. Throughput: 0: 6562.1. Samples: 483298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:09:25,750][2713170] Avg episode reward: [(0, '4.767')]
[2025-03-28 18:09:26,026][2761575] Updated weights for policy 0, policy_version 480 (0.0006)
[2025-03-28 18:09:27,537][2761575] Updated weights for policy 0, policy_version 490 (0.0006)
[2025-03-28 18:09:29,060][2761575] Updated weights for policy 0, policy_version 500 (0.0006)
[2025-03-28 18:09:30,749][2713170] Fps is (10 sec: 26623.9, 60 sec: 26282.7, 300 sec: 26009.6). Total num frames: 2080768. Throughput: 0: 6491.6. Samples: 520454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:09:30,750][2713170] Avg episode reward: [(0, '4.612')]
[2025-03-28 18:09:31,055][2761575] Updated weights for policy 0, policy_version 510 (0.0008)
[2025-03-28 18:09:32,617][2761575] Updated weights for policy 0, policy_version 520 (0.0006)
[2025-03-28 18:09:34,152][2761575] Updated weights for policy 0, policy_version 530 (0.0006)
[2025-03-28 18:09:35,656][2761575] Updated weights for policy 0, policy_version 540 (0.0006)
[2025-03-28 18:09:35,749][2713170] Fps is (10 sec: 25395.3, 60 sec: 26146.2, 300 sec: 26021.7). Total num frames: 2211840. Throughput: 0: 6502.2. Samples: 540710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:09:35,750][2713170] Avg episode reward: [(0, '4.914')]
[2025-03-28 18:09:37,189][2761575] Updated weights for policy 0, policy_version 550 (0.0006)
[2025-03-28 18:09:38,700][2761575] Updated weights for policy 0, policy_version 560 (0.0006)
[2025-03-28 18:09:40,235][2761575] Updated weights for policy 0, policy_version 570 (0.0006)
[2025-03-28 18:09:40,749][2713170] Fps is (10 sec: 26624.1, 60 sec: 26214.4, 300 sec: 26077.9). Total num frames: 2347008. Throughput: 0: 6497.7. Samples: 581158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:09:40,750][2713170] Avg episode reward: [(0, '4.892')]
[2025-03-28 18:09:41,772][2761575] Updated weights for policy 0, policy_version 580 (0.0006)
[2025-03-28 18:09:43,302][2761575] Updated weights for policy 0, policy_version 590 (0.0006)
[2025-03-28 18:09:44,866][2761575] Updated weights for policy 0, policy_version 600 (0.0006)
[2025-03-28 18:09:45,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 26146.1, 300 sec: 26085.1). Total num frames: 2478080. Throughput: 0: 6528.9. Samples: 620802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 18:09:45,750][2713170] Avg episode reward: [(0, '4.685')]
[2025-03-28 18:09:46,391][2761575] Updated weights for policy 0, policy_version 610 (0.0006)
[2025-03-28 18:09:47,967][2761575] Updated weights for policy 0, policy_version 620 (0.0006)
[2025-03-28 18:09:49,521][2761575] Updated weights for policy 0, policy_version 630 (0.0006)
[2025-03-28 18:09:50,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 26146.1, 300 sec: 26091.6). Total num frames: 2609152. Throughput: 0: 6511.8. Samples: 640626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:09:50,750][2713170] Avg episode reward: [(0, '4.976')]
[2025-03-28 18:09:51,091][2761575] Updated weights for policy 0, policy_version 640 (0.0006)
[2025-03-28 18:09:52,602][2761575] Updated weights for policy 0, policy_version 650 (0.0006)
[2025-03-28 18:09:54,165][2761575] Updated weights for policy 0, policy_version 660 (0.0006)
[2025-03-28 18:09:55,601][2761575] Updated weights for policy 0, policy_version 670 (0.0006)
[2025-03-28 18:09:55,749][2713170] Fps is (10 sec: 26624.1, 60 sec: 26146.2, 300 sec: 26136.4). Total num frames: 2744320. Throughput: 0: 6514.9. Samples: 680628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:09:55,750][2713170] Avg episode reward: [(0, '5.383')]
[2025-03-28 18:09:55,756][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth...
[2025-03-28 18:09:55,867][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000659_2699264.pth
[2025-03-28 18:09:55,874][2761553] Saving new best policy, reward=5.383!
[2025-03-28 18:09:57,073][2761575] Updated weights for policy 0, policy_version 680 (0.0006)
[2025-03-28 18:09:58,549][2761575] Updated weights for policy 0, policy_version 690 (0.0006)
[2025-03-28 18:10:00,749][2713170] Fps is (10 sec: 25394.9, 60 sec: 25941.3, 300 sec: 26028.2). Total num frames: 2863104. Throughput: 0: 6597.7. Samples: 717890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:10:00,750][2713170] Avg episode reward: [(0, '5.080')]
[2025-03-28 18:10:00,758][2761575] Updated weights for policy 0, policy_version 700 (0.0007)
[2025-03-28 18:10:02,217][2761575] Updated weights for policy 0, policy_version 710 (0.0007)
[2025-03-28 18:10:03,811][2761575] Updated weights for policy 0, policy_version 720 (0.0006)
[2025-03-28 18:10:05,314][2761575] Updated weights for policy 0, policy_version 730 (0.0006)
[2025-03-28 18:10:05,749][2713170] Fps is (10 sec: 25804.9, 60 sec: 26009.6, 300 sec: 26107.6). Total num frames: 3002368. Throughput: 0: 6582.2. Samples: 737548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:05,750][2713170] Avg episode reward: [(0, '4.590')]
[2025-03-28 18:10:06,754][2761575] Updated weights for policy 0, policy_version 740 (0.0006)
[2025-03-28 18:10:08,294][2761575] Updated weights for policy 0, policy_version 750 (0.0006)
[2025-03-28 18:10:09,818][2761575] Updated weights for policy 0, policy_version 760 (0.0006)
[2025-03-28 18:10:10,749][2713170] Fps is (10 sec: 25804.8, 60 sec: 26009.6, 300 sec: 26009.6). Total num frames: 3121152. Throughput: 0: 6559.1. Samples: 778456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:10,750][2713170] Avg episode reward: [(0, '5.235')]
[2025-03-28 18:10:12,635][2761575] Updated weights for policy 0, policy_version 770 (0.0006)
[2025-03-28 18:10:14,193][2761575] Updated weights for policy 0, policy_version 780 (0.0006)
[2025-03-28 18:10:15,750][2761575] Updated weights for policy 0, policy_version 790 (0.0006)
[2025-03-28 18:10:15,749][2713170] Fps is (10 sec: 22937.3, 60 sec: 25873.0, 300 sec: 25854.0). Total num frames: 3231744. Throughput: 0: 6429.1. Samples: 809764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:10:15,751][2713170] Avg episode reward: [(0, '5.369')]
[2025-03-28 18:10:17,226][2761575] Updated weights for policy 0, policy_version 800 (0.0006)
[2025-03-28 18:10:18,817][2761575] Updated weights for policy 0, policy_version 810 (0.0006)
[2025-03-28 18:10:20,405][2761575] Updated weights for policy 0, policy_version 820 (0.0006)
[2025-03-28 18:10:20,749][2713170] Fps is (10 sec: 24575.9, 60 sec: 25873.0, 300 sec: 25899.3). Total num frames: 3366912. Throughput: 0: 6417.5. Samples: 829498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:20,750][2713170] Avg episode reward: [(0, '4.745')]
[2025-03-28 18:10:21,864][2761575] Updated weights for policy 0, policy_version 830 (0.0006)
[2025-03-28 18:10:23,400][2761575] Updated weights for policy 0, policy_version 840 (0.0006)
[2025-03-28 18:10:25,086][2761575] Updated weights for policy 0, policy_version 850 (0.0006)
[2025-03-28 18:10:25,749][2713170] Fps is (10 sec: 26624.1, 60 sec: 25668.2, 300 sec: 25911.0). Total num frames: 3497984. Throughput: 0: 6393.9. Samples: 868882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:25,750][2713170] Avg episode reward: [(0, '5.201')]
[2025-03-28 18:10:26,598][2761575] Updated weights for policy 0, policy_version 860 (0.0006)
[2025-03-28 18:10:28,133][2761575] Updated weights for policy 0, policy_version 870 (0.0006)
[2025-03-28 18:10:29,700][2761575] Updated weights for policy 0, policy_version 880 (0.0006)
[2025-03-28 18:10:30,749][2713170] Fps is (10 sec: 25395.5, 60 sec: 25668.3, 300 sec: 25863.3). Total num frames: 3620864. Throughput: 0: 6330.9. Samples: 905692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:10:30,751][2713170] Avg episode reward: [(0, '5.018')]
[2025-03-28 18:10:31,686][2761575] Updated weights for policy 0, policy_version 890 (0.0007)
[2025-03-28 18:10:33,272][2761575] Updated weights for policy 0, policy_version 900 (0.0006)
[2025-03-28 18:10:34,752][2761575] Updated weights for policy 0, policy_version 910 (0.0007)
[2025-03-28 18:10:35,749][2713170] Fps is (10 sec: 25395.4, 60 sec: 25668.3, 300 sec: 25875.5). Total num frames: 3751936. Throughput: 0: 6329.7. Samples: 925462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:10:35,750][2713170] Avg episode reward: [(0, '5.452')]
[2025-03-28 18:10:35,754][2761553] Saving new best policy, reward=5.452!
[2025-03-28 18:10:36,266][2761575] Updated weights for policy 0, policy_version 920 (0.0006)
[2025-03-28 18:10:37,811][2761575] Updated weights for policy 0, policy_version 930 (0.0006)
[2025-03-28 18:10:39,288][2761575] Updated weights for policy 0, policy_version 940 (0.0006)
[2025-03-28 18:10:40,749][2713170] Fps is (10 sec: 26624.0, 60 sec: 25668.3, 300 sec: 25914.1). Total num frames: 3887104. Throughput: 0: 6343.9. Samples: 966104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:10:40,750][2713170] Avg episode reward: [(0, '5.350')]
[2025-03-28 18:10:40,873][2761575] Updated weights for policy 0, policy_version 950 (0.0006)
[2025-03-28 18:10:42,368][2761575] Updated weights for policy 0, policy_version 960 (0.0006)
[2025-03-28 18:10:43,955][2761575] Updated weights for policy 0, policy_version 970 (0.0006)
[2025-03-28 18:10:45,536][2761575] Updated weights for policy 0, policy_version 980 (0.0006)
[2025-03-28 18:10:45,749][2713170] Fps is (10 sec: 26624.0, 60 sec: 25668.3, 300 sec: 25923.7). Total num frames: 4018176. Throughput: 0: 6390.7. Samples: 1005472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:45,750][2713170] Avg episode reward: [(0, '5.500')]
[2025-03-28 18:10:45,756][2761553] Saving new best policy, reward=5.500!
[2025-03-28 18:10:47,015][2761575] Updated weights for policy 0, policy_version 990 (0.0007)
[2025-03-28 18:10:48,538][2761575] Updated weights for policy 0, policy_version 1000 (0.0007)
[2025-03-28 18:10:50,064][2761575] Updated weights for policy 0, policy_version 1010 (0.0007)
[2025-03-28 18:10:50,749][2713170] Fps is (10 sec: 26623.7, 60 sec: 25736.5, 300 sec: 25958.4). Total num frames: 4153344. Throughput: 0: 6405.5. Samples: 1025796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:10:50,750][2713170] Avg episode reward: [(0, '5.786')]
[2025-03-28 18:10:50,752][2761553] Saving new best policy, reward=5.786!
[2025-03-28 18:10:51,614][2761575] Updated weights for policy 0, policy_version 1020 (0.0006)
[2025-03-28 18:10:53,076][2761575] Updated weights for policy 0, policy_version 1030 (0.0006)
[2025-03-28 18:10:54,607][2761575] Updated weights for policy 0, policy_version 1040 (0.0007)
[2025-03-28 18:10:55,749][2713170] Fps is (10 sec: 27032.7, 60 sec: 25736.4, 300 sec: 25991.0). Total num frames: 4288512. Throughput: 0: 6403.7. Samples: 1066626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:10:55,751][2713170] Avg episode reward: [(0, '6.084')]
[2025-03-28 18:10:55,761][2761553] Saving new best policy, reward=6.084!
[2025-03-28 18:10:56,076][2761575] Updated weights for policy 0, policy_version 1050 (0.0006)
[2025-03-28 18:10:57,553][2761575] Updated weights for policy 0, policy_version 1060 (0.0006)
[2025-03-28 18:10:59,052][2761575] Updated weights for policy 0, policy_version 1070 (0.0006)
[2025-03-28 18:11:00,749][2713170] Fps is (10 sec: 25395.5, 60 sec: 25736.6, 300 sec: 25925.3). Total num frames: 4407296. Throughput: 0: 6171.8. Samples: 1087492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 18:11:00,750][2713170] Avg episode reward: [(0, '6.544')]
[2025-03-28 18:11:00,847][2761553] Saving new best policy, reward=6.544!
[2025-03-28 18:11:01,323][2761575] Updated weights for policy 0, policy_version 1080 (0.0008)
[2025-03-28 18:11:02,906][2761575] Updated weights for policy 0, policy_version 1090 (0.0006)
[2025-03-28 18:11:04,463][2761575] Updated weights for policy 0, policy_version 1100 (0.0006)
[2025-03-28 18:11:05,749][2713170] Fps is (10 sec: 24986.4, 60 sec: 25600.0, 300 sec: 25933.6). Total num frames: 4538368. Throughput: 0: 6508.8. Samples: 1122394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:05,750][2713170] Avg episode reward: [(0, '5.801')]
[2025-03-28 18:11:06,019][2761575] Updated weights for policy 0, policy_version 1110 (0.0006)
[2025-03-28 18:11:07,546][2761575] Updated weights for policy 0, policy_version 1120 (0.0006)
[2025-03-28 18:11:09,087][2761575] Updated weights for policy 0, policy_version 1130 (0.0006)
[2025-03-28 18:11:10,455][2761575] Updated weights for policy 0, policy_version 1140 (0.0006)
[2025-03-28 18:11:10,749][2713170] Fps is (10 sec: 26624.0, 60 sec: 25873.1, 300 sec: 25964.1). Total num frames: 4673536. Throughput: 0: 6528.0. Samples: 1162640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-03-28 18:11:10,750][2713170] Avg episode reward: [(0, '5.945')]
[2025-03-28 18:11:11,886][2761575] Updated weights for policy 0, policy_version 1150 (0.0006)
[2025-03-28 18:11:13,312][2761575] Updated weights for policy 0, policy_version 1160 (0.0006)
[2025-03-28 18:11:14,832][2761575] Updated weights for policy 0, policy_version 1170 (0.0006)
[2025-03-28 18:11:15,749][2713170] Fps is (10 sec: 27852.9, 60 sec: 26419.3, 300 sec: 26037.3). Total num frames: 4816896. Throughput: 0: 6649.0. Samples: 1204898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:11:15,750][2713170] Avg episode reward: [(0, '6.605')]
[2025-03-28 18:11:15,755][2761553] Saving new best policy, reward=6.605!
[2025-03-28 18:11:16,358][2761575] Updated weights for policy 0, policy_version 1180 (0.0006)
[2025-03-28 18:11:17,822][2761575] Updated weights for policy 0, policy_version 1190 (0.0006)
[2025-03-28 18:11:19,335][2761575] Updated weights for policy 0, policy_version 1200 (0.0006)
[2025-03-28 18:11:20,749][2713170] Fps is (10 sec: 27853.0, 60 sec: 26419.3, 300 sec: 26063.5). Total num frames: 4952064. Throughput: 0: 6661.7. Samples: 1225238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:20,750][2713170] Avg episode reward: [(0, '6.253')]
[2025-03-28 18:11:20,886][2761575] Updated weights for policy 0, policy_version 1210 (0.0006)
[2025-03-28 18:11:22,395][2761575] Updated weights for policy 0, policy_version 1220 (0.0006)
[2025-03-28 18:11:23,965][2761575] Updated weights for policy 0, policy_version 1230 (0.0006)
[2025-03-28 18:11:25,388][2761575] Updated weights for policy 0, policy_version 1240 (0.0006)
[2025-03-28 18:11:25,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 26487.5, 300 sec: 26088.4). Total num frames: 5087232. Throughput: 0: 6654.6. Samples: 1265560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:11:25,750][2713170] Avg episode reward: [(0, '5.850')]
[2025-03-28 18:11:26,901][2761575] Updated weights for policy 0, policy_version 1250 (0.0006)
[2025-03-28 18:11:28,407][2761575] Updated weights for policy 0, policy_version 1260 (0.0006)
[2025-03-28 18:11:29,946][2761575] Updated weights for policy 0, policy_version 1270 (0.0006)
[2025-03-28 18:11:30,749][2713170] Fps is (10 sec: 25395.0, 60 sec: 26419.2, 300 sec: 26030.1). Total num frames: 5206016. Throughput: 0: 6242.3. Samples: 1286376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:30,750][2713170] Avg episode reward: [(0, '8.475')]
[2025-03-28 18:11:30,974][2761553] Saving new best policy, reward=8.475!
[2025-03-28 18:11:32,224][2761575] Updated weights for policy 0, policy_version 1280 (0.0008)
[2025-03-28 18:11:33,736][2761575] Updated weights for policy 0, policy_version 1290 (0.0006)
[2025-03-28 18:11:35,240][2761575] Updated weights for policy 0, policy_version 1300 (0.0006)
[2025-03-28 18:11:35,749][2713170] Fps is (10 sec: 24985.6, 60 sec: 26419.2, 300 sec: 26034.6). Total num frames: 5337088. Throughput: 0: 6574.7. Samples: 1321656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:35,750][2713170] Avg episode reward: [(0, '7.334')]
[2025-03-28 18:11:36,624][2761575] Updated weights for policy 0, policy_version 1310 (0.0006)
[2025-03-28 18:11:38,153][2761575] Updated weights for policy 0, policy_version 1320 (0.0006)
[2025-03-28 18:11:39,706][2761575] Updated weights for policy 0, policy_version 1330 (0.0006)
[2025-03-28 18:11:40,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 26419.2, 300 sec: 26058.4). Total num frames: 5472256. Throughput: 0: 6581.9. Samples: 1362810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:40,751][2713170] Avg episode reward: [(0, '8.143')]
[2025-03-28 18:11:41,220][2761575] Updated weights for policy 0, policy_version 1340 (0.0006)
[2025-03-28 18:11:42,746][2761575] Updated weights for policy 0, policy_version 1350 (0.0006)
[2025-03-28 18:11:44,281][2761575] Updated weights for policy 0, policy_version 1360 (0.0006)
[2025-03-28 18:11:45,730][2761575] Updated weights for policy 0, policy_version 1370 (0.0006)
[2025-03-28 18:11:45,749][2713170] Fps is (10 sec: 27443.1, 60 sec: 26555.7, 300 sec: 26100.1). Total num frames: 5611520. Throughput: 0: 7024.0. Samples: 1403572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:45,750][2713170] Avg episode reward: [(0, '9.668')]
[2025-03-28 18:11:45,755][2761553] Saving new best policy, reward=9.668!
[2025-03-28 18:11:47,185][2761575] Updated weights for policy 0, policy_version 1380 (0.0006)
[2025-03-28 18:11:48,598][2761575] Updated weights for policy 0, policy_version 1390 (0.0006)
[2025-03-28 18:11:50,058][2761575] Updated weights for policy 0, policy_version 1400 (0.0006)
[2025-03-28 18:11:50,749][2713170] Fps is (10 sec: 27853.0, 60 sec: 26624.0, 300 sec: 26139.9). Total num frames: 5750784. Throughput: 0: 6724.3. Samples: 1424988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:50,750][2713170] Avg episode reward: [(0, '9.729')]
[2025-03-28 18:11:50,751][2761553] Saving new best policy, reward=9.729!
[2025-03-28 18:11:51,453][2761575] Updated weights for policy 0, policy_version 1410 (0.0006)
[2025-03-28 18:11:52,834][2761575] Updated weights for policy 0, policy_version 1420 (0.0006)
[2025-03-28 18:11:54,310][2761575] Updated weights for policy 0, policy_version 1430 (0.0006)
[2025-03-28 18:11:55,749][2713170] Fps is (10 sec: 28262.1, 60 sec: 26760.6, 300 sec: 26196.2). Total num frames: 5894144. Throughput: 0: 6784.7. Samples: 1467952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:11:55,750][2713170] Avg episode reward: [(0, '12.318')]
[2025-03-28 18:11:55,758][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001439_5894144.pth...
[2025-03-28 18:11:55,883][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth
[2025-03-28 18:11:55,890][2761553] Saving new best policy, reward=12.318!
[2025-03-28 18:11:56,000][2761575] Updated weights for policy 0, policy_version 1440 (0.0007)
[2025-03-28 18:11:57,342][2761575] Updated weights for policy 0, policy_version 1450 (0.0006)
[2025-03-28 18:11:58,849][2761575] Updated weights for policy 0, policy_version 1460 (0.0006)
[2025-03-28 18:12:00,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 26760.5, 300 sec: 26143.2). Total num frames: 6012928. Throughput: 0: 6296.2. Samples: 1488226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:00,750][2713170] Avg episode reward: [(0, '12.366')]
[2025-03-28 18:12:01,082][2761553] Saving new best policy, reward=12.366!
[2025-03-28 18:12:01,232][2761575] Updated weights for policy 0, policy_version 1470 (0.0007)
[2025-03-28 18:12:02,771][2761575] Updated weights for policy 0, policy_version 1480 (0.0006)
[2025-03-28 18:12:04,254][2761575] Updated weights for policy 0, policy_version 1490 (0.0006)
[2025-03-28 18:12:05,728][2761575] Updated weights for policy 0, policy_version 1500 (0.0006)
[2025-03-28 18:12:05,749][2713170] Fps is (10 sec: 24986.0, 60 sec: 26760.5, 300 sec: 26144.7). Total num frames: 6144000. Throughput: 0: 6615.9. Samples: 1522952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:12:05,750][2713170] Avg episode reward: [(0, '15.018')]
[2025-03-28 18:12:05,754][2761553] Saving new best policy, reward=15.018!
[2025-03-28 18:12:07,205][2761575] Updated weights for policy 0, policy_version 1510 (0.0006)
[2025-03-28 18:12:08,691][2761575] Updated weights for policy 0, policy_version 1520 (0.0006)
[2025-03-28 18:12:10,162][2761575] Updated weights for policy 0, policy_version 1530 (0.0006)
[2025-03-28 18:12:10,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 26828.8, 300 sec: 26180.3). Total num frames: 6283264. Throughput: 0: 6642.0. Samples: 1564452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:12:10,750][2713170] Avg episode reward: [(0, '17.186')]
[2025-03-28 18:12:10,751][2761553] Saving new best policy, reward=17.186!
[2025-03-28 18:12:11,626][2761575] Updated weights for policy 0, policy_version 1540 (0.0006)
[2025-03-28 18:12:13,102][2761575] Updated weights for policy 0, policy_version 1550 (0.0006)
[2025-03-28 18:12:14,573][2761575] Updated weights for policy 0, policy_version 1560 (0.0006)
[2025-03-28 18:12:15,749][2713170] Fps is (10 sec: 27852.8, 60 sec: 26760.5, 300 sec: 26214.4). Total num frames: 6422528. Throughput: 0: 7113.8. Samples: 1606498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-03-28 18:12:15,750][2713170] Avg episode reward: [(0, '17.213')]
[2025-03-28 18:12:15,754][2761553] Saving new best policy, reward=17.213!
[2025-03-28 18:12:16,013][2761575] Updated weights for policy 0, policy_version 1570 (0.0006)
[2025-03-28 18:12:17,431][2761575] Updated weights for policy 0, policy_version 1580 (0.0006)
[2025-03-28 18:12:18,912][2761575] Updated weights for policy 0, policy_version 1590 (0.0006)
[2025-03-28 18:12:20,330][2761575] Updated weights for policy 0, policy_version 1600 (0.0006)
[2025-03-28 18:12:20,749][2713170] Fps is (10 sec: 27852.6, 60 sec: 26828.7, 300 sec: 26247.2). Total num frames: 6561792. Throughput: 0: 6797.4. Samples: 1627540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:12:20,750][2713170] Avg episode reward: [(0, '17.337')]
[2025-03-28 18:12:20,752][2761553] Saving new best policy, reward=17.337!
[2025-03-28 18:12:21,768][2761575] Updated weights for policy 0, policy_version 1610 (0.0007)
[2025-03-28 18:12:23,151][2761575] Updated weights for policy 0, policy_version 1620 (0.0007)
[2025-03-28 18:12:24,578][2761575] Updated weights for policy 0, policy_version 1630 (0.0007)
[2025-03-28 18:12:25,749][2713170] Fps is (10 sec: 28671.9, 60 sec: 27033.6, 300 sec: 26310.8). Total num frames: 6709248. Throughput: 0: 6849.9. Samples: 1671054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:25,750][2713170] Avg episode reward: [(0, '17.790')]
[2025-03-28 18:12:25,755][2761553] Saving new best policy, reward=17.790!
[2025-03-28 18:12:25,923][2761575] Updated weights for policy 0, policy_version 1640 (0.0006)
[2025-03-28 18:12:27,352][2761575] Updated weights for policy 0, policy_version 1650 (0.0006)
[2025-03-28 18:12:28,870][2761575] Updated weights for policy 0, policy_version 1660 (0.0006)
[2025-03-28 18:12:30,749][2713170] Fps is (10 sec: 27033.7, 60 sec: 27101.8, 300 sec: 26277.4). Total num frames: 6832128. Throughput: 0: 6430.9. Samples: 1692962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:30,750][2713170] Avg episode reward: [(0, '21.064')]
[2025-03-28 18:12:31,210][2761553] Saving new best policy, reward=21.064!
[2025-03-28 18:12:31,332][2761575] Updated weights for policy 0, policy_version 1670 (0.0007)
[2025-03-28 18:12:32,939][2761575] Updated weights for policy 0, policy_version 1680 (0.0006)
[2025-03-28 18:12:34,426][2761575] Updated weights for policy 0, policy_version 1690 (0.0006)
[2025-03-28 18:12:35,749][2713170] Fps is (10 sec: 24985.6, 60 sec: 27033.6, 300 sec: 26260.8). Total num frames: 6959104. Throughput: 0: 6698.4. Samples: 1726418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:35,750][2713170] Avg episode reward: [(0, '21.138')]
[2025-03-28 18:12:35,756][2761553] Saving new best policy, reward=21.138!
[2025-03-28 18:12:35,864][2761575] Updated weights for policy 0, policy_version 1700 (0.0007)
[2025-03-28 18:12:37,318][2761575] Updated weights for policy 0, policy_version 1710 (0.0006)
[2025-03-28 18:12:38,729][2761575] Updated weights for policy 0, policy_version 1720 (0.0006)
[2025-03-28 18:12:40,239][2761575] Updated weights for policy 0, policy_version 1730 (0.0006)
[2025-03-28 18:12:40,749][2713170] Fps is (10 sec: 26624.1, 60 sec: 27101.9, 300 sec: 26290.3). Total num frames: 7098368. Throughput: 0: 6688.9. Samples: 1768952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:12:40,750][2713170] Avg episode reward: [(0, '20.840')]
[2025-03-28 18:12:41,700][2761575] Updated weights for policy 0, policy_version 1740 (0.0007)
[2025-03-28 18:12:43,280][2761575] Updated weights for policy 0, policy_version 1750 (0.0006)
[2025-03-28 18:12:44,930][2761575] Updated weights for policy 0, policy_version 1760 (0.0006)
[2025-03-28 18:12:45,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 26897.0, 300 sec: 26274.0). Total num frames: 7225344. Throughput: 0: 7103.0. Samples: 1807860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-28 18:12:45,750][2713170] Avg episode reward: [(0, '18.316')]
[2025-03-28 18:12:46,580][2761575] Updated weights for policy 0, policy_version 1770 (0.0006)
[2025-03-28 18:12:48,104][2761575] Updated weights for policy 0, policy_version 1780 (0.0006)
[2025-03-28 18:12:49,727][2761575] Updated weights for policy 0, policy_version 1790 (0.0006)
[2025-03-28 18:12:50,749][2713170] Fps is (10 sec: 25804.6, 60 sec: 26760.5, 300 sec: 26272.9). Total num frames: 7356416. Throughput: 0: 6760.8. Samples: 1827188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:50,750][2713170] Avg episode reward: [(0, '16.420')]
[2025-03-28 18:12:51,277][2761575] Updated weights for policy 0, policy_version 1800 (0.0007)
[2025-03-28 18:12:52,847][2761575] Updated weights for policy 0, policy_version 1810 (0.0006)
[2025-03-28 18:12:54,402][2761575] Updated weights for policy 0, policy_version 1820 (0.0006)
[2025-03-28 18:12:55,749][2713170] Fps is (10 sec: 26214.5, 60 sec: 26555.8, 300 sec: 26271.9). Total num frames: 7487488. Throughput: 0: 6712.7. Samples: 1866526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:12:55,750][2713170] Avg episode reward: [(0, '20.495')]
[2025-03-28 18:12:55,934][2761575] Updated weights for policy 0, policy_version 1830 (0.0006)
[2025-03-28 18:12:57,542][2761575] Updated weights for policy 0, policy_version 1840 (0.0006)
[2025-03-28 18:12:59,171][2761575] Updated weights for policy 0, policy_version 1850 (0.0006)
[2025-03-28 18:13:00,749][2713170] Fps is (10 sec: 24166.6, 60 sec: 26419.2, 300 sec: 26200.3). Total num frames: 7598080. Throughput: 0: 6213.4. Samples: 1886100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:13:00,750][2713170] Avg episode reward: [(0, '23.074')]
[2025-03-28 18:13:01,220][2761553] Saving new best policy, reward=23.074!
[2025-03-28 18:13:01,883][2761575] Updated weights for policy 0, policy_version 1860 (0.0008)
[2025-03-28 18:13:03,466][2761575] Updated weights for policy 0, policy_version 1870 (0.0007)
[2025-03-28 18:13:05,004][2761575] Updated weights for policy 0, policy_version 1880 (0.0006)
[2025-03-28 18:13:05,749][2713170] Fps is (10 sec: 22937.6, 60 sec: 26214.4, 300 sec: 26158.9). Total num frames: 7716864. Throughput: 0: 6440.0. Samples: 1917340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:13:05,750][2713170] Avg episode reward: [(0, '22.152')]
[2025-03-28 18:13:06,591][2761575] Updated weights for policy 0, policy_version 1890 (0.0006)
[2025-03-28 18:13:08,176][2761575] Updated weights for policy 0, policy_version 1900 (0.0006)
[2025-03-28 18:13:09,721][2761575] Updated weights for policy 0, policy_version 1910 (0.0006)
[2025-03-28 18:13:10,749][2713170] Fps is (10 sec: 24985.4, 60 sec: 26077.8, 300 sec: 26228.3). Total num frames: 7847936. Throughput: 0: 6346.0. Samples: 1956626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:13:10,750][2713170] Avg episode reward: [(0, '19.936')]
[2025-03-28 18:13:11,213][2761575] Updated weights for policy 0, policy_version 1920 (0.0006)
[2025-03-28 18:13:12,629][2761575] Updated weights for policy 0, policy_version 1930 (0.0006)
[2025-03-28 18:13:14,184][2761575] Updated weights for policy 0, policy_version 1940 (0.0006)
[2025-03-28 18:13:15,626][2761575] Updated weights for policy 0, policy_version 1950 (0.0006)
[2025-03-28 18:13:15,749][2713170] Fps is (10 sec: 27033.4, 60 sec: 26077.8, 300 sec: 26242.2). Total num frames: 7987200. Throughput: 0: 6781.3. Samples: 1998122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:13:15,750][2713170] Avg episode reward: [(0, '23.233')]
[2025-03-28 18:13:15,756][2761553] Saving new best policy, reward=23.233!
[2025-03-28 18:13:17,190][2761575] Updated weights for policy 0, policy_version 1960 (0.0006)
[2025-03-28 18:13:18,779][2761575] Updated weights for policy 0, policy_version 1970 (0.0007)
[2025-03-28 18:13:20,284][2761575] Updated weights for policy 0, policy_version 1980 (0.0006)
[2025-03-28 18:13:20,749][2713170] Fps is (10 sec: 27033.8, 60 sec: 25941.4, 300 sec: 26228.3). Total num frames: 8118272. Throughput: 0: 6474.8. Samples: 2017784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:13:20,750][2713170] Avg episode reward: [(0, '23.993')]
[2025-03-28 18:13:20,752][2761553] Saving new best policy, reward=23.993!
[2025-03-28 18:13:21,840][2761575] Updated weights for policy 0, policy_version 1990 (0.0006)
[2025-03-28 18:13:23,284][2761575] Updated weights for policy 0, policy_version 2000 (0.0006)
[2025-03-28 18:13:24,791][2761575] Updated weights for policy 0, policy_version 2010 (0.0006)
[2025-03-28 18:13:25,749][2713170] Fps is (10 sec: 27443.6, 60 sec: 25873.1, 300 sec: 26297.7). Total num frames: 8261632. Throughput: 0: 6433.2. Samples: 2058446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:13:25,750][2713170] Avg episode reward: [(0, '20.369')]
[2025-03-28 18:13:26,141][2761575] Updated weights for policy 0, policy_version 2020 (0.0006)
[2025-03-28 18:13:27,482][2761575] Updated weights for policy 0, policy_version 2030 (0.0006)
[2025-03-28 18:13:28,811][2761575] Updated weights for policy 0, policy_version 2040 (0.0006)
[2025-03-28 18:13:30,749][2713170] Fps is (10 sec: 27033.4, 60 sec: 25941.3, 300 sec: 26256.1). Total num frames: 8388608. Throughput: 0: 6075.7. Samples: 2081266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:13:30,750][2713170] Avg episode reward: [(0, '25.474')]
[2025-03-28 18:13:31,235][2761553] Saving new best policy, reward=25.474!
[2025-03-28 18:13:31,360][2761575] Updated weights for policy 0, policy_version 2050 (0.0007)
[2025-03-28 18:13:32,911][2761575] Updated weights for policy 0, policy_version 2060 (0.0006)
[2025-03-28 18:13:34,347][2761575] Updated weights for policy 0, policy_version 2070 (0.0006)
[2025-03-28 18:13:35,749][2713170] Fps is (10 sec: 25395.2, 60 sec: 25941.4, 300 sec: 26242.2). Total num frames: 8515584. Throughput: 0: 6413.5. Samples: 2115796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:13:35,750][2713170] Avg episode reward: [(0, '19.077')]
[2025-03-28 18:13:35,857][2761575] Updated weights for policy 0, policy_version 2080 (0.0006)
[2025-03-28 18:13:37,271][2761575] Updated weights for policy 0, policy_version 2090 (0.0006)
[2025-03-28 18:13:38,754][2761575] Updated weights for policy 0, policy_version 2100 (0.0006)
[2025-03-28 18:13:40,189][2761575] Updated weights for policy 0, policy_version 2110 (0.0006)
[2025-03-28 18:13:40,749][2713170] Fps is (10 sec: 26624.3, 60 sec: 25941.4, 300 sec: 26256.1). Total num frames: 8654848. Throughput: 0: 6476.9. Samples: 2157988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:13:40,750][2713170] Avg episode reward: [(0, '21.779')]
[2025-03-28 18:13:41,644][2761575] Updated weights for policy 0, policy_version 2120 (0.0006)
[2025-03-28 18:13:43,144][2761575] Updated weights for policy 0, policy_version 2130 (0.0006)
[2025-03-28 18:13:44,616][2761575] Updated weights for policy 0, policy_version 2140 (0.0006)
[2025-03-28 18:13:45,749][2713170] Fps is (10 sec: 27852.6, 60 sec: 26146.2, 300 sec: 26283.8). Total num frames: 8794112. Throughput: 0: 6969.5. Samples: 2199726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:13:45,750][2713170] Avg episode reward: [(0, '21.749')]
[2025-03-28 18:13:46,107][2761575] Updated weights for policy 0, policy_version 2150 (0.0006)
[2025-03-28 18:13:47,532][2761575] Updated weights for policy 0, policy_version 2160 (0.0006)
[2025-03-28 18:13:48,990][2761575] Updated weights for policy 0, policy_version 2170 (0.0006)
[2025-03-28 18:13:50,455][2761575] Updated weights for policy 0, policy_version 2180 (0.0006)
[2025-03-28 18:13:50,749][2713170] Fps is (10 sec: 28262.5, 60 sec: 26351.0, 300 sec: 26311.6). Total num frames: 8937472. Throughput: 0: 6750.1. Samples: 2221092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:13:50,750][2713170] Avg episode reward: [(0, '24.903')]
[2025-03-28 18:13:51,874][2761575] Updated weights for policy 0, policy_version 2190 (0.0007)
[2025-03-28 18:13:53,272][2761575] Updated weights for policy 0, policy_version 2200 (0.0007)
[2025-03-28 18:13:54,739][2761575] Updated weights for policy 0, policy_version 2210 (0.0006)
[2025-03-28 18:13:55,749][2713170] Fps is (10 sec: 28672.0, 60 sec: 26555.8, 300 sec: 26353.2). Total num frames: 9080832. Throughput: 0: 6825.1. Samples: 2263756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:13:55,750][2713170] Avg episode reward: [(0, '23.567')]
[2025-03-28 18:13:55,754][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002217_9080832.pth...
[2025-03-28 18:13:55,849][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
[2025-03-28 18:13:56,126][2761575] Updated weights for policy 0, policy_version 2220 (0.0006)
[2025-03-28 18:13:57,663][2761575] Updated weights for policy 0, policy_version 2230 (0.0006)
[2025-03-28 18:13:59,141][2761575] Updated weights for policy 0, policy_version 2240 (0.0006)
[2025-03-28 18:14:00,749][2713170] Fps is (10 sec: 26213.4, 60 sec: 26692.1, 300 sec: 26297.7). Total num frames: 9199616. Throughput: 0: 6369.3. Samples: 2284744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:14:00,751][2713170] Avg episode reward: [(0, '24.604')]
[2025-03-28 18:14:01,763][2761575] Updated weights for policy 0, policy_version 2250 (0.0006)
[2025-03-28 18:14:03,310][2761575] Updated weights for policy 0, policy_version 2260 (0.0006)
[2025-03-28 18:14:04,749][2761575] Updated weights for policy 0, policy_version 2270 (0.0006)
[2025-03-28 18:14:05,749][2713170] Fps is (10 sec: 24166.2, 60 sec: 26760.5, 300 sec: 26311.6). Total num frames: 9322496. Throughput: 0: 6669.6. Samples: 2317916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-28 18:14:05,750][2713170] Avg episode reward: [(0, '24.252')]
[2025-03-28 18:14:06,297][2761575] Updated weights for policy 0, policy_version 2280 (0.0006)
[2025-03-28 18:14:07,747][2761575] Updated weights for policy 0, policy_version 2290 (0.0006)
[2025-03-28 18:14:09,184][2761575] Updated weights for policy 0, policy_version 2300 (0.0006)
[2025-03-28 18:14:10,703][2761575] Updated weights for policy 0, policy_version 2310 (0.0006)
[2025-03-28 18:14:10,749][2713170] Fps is (10 sec: 26215.0, 60 sec: 26897.1, 300 sec: 26381.0). Total num frames: 9461760. Throughput: 0: 6696.1. Samples: 2359772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:14:10,750][2713170] Avg episode reward: [(0, '24.326')]
[2025-03-28 18:14:12,216][2761575] Updated weights for policy 0, policy_version 2320 (0.0006)
[2025-03-28 18:14:13,745][2761575] Updated weights for policy 0, policy_version 2330 (0.0006)
[2025-03-28 18:14:15,209][2761575] Updated weights for policy 0, policy_version 2340 (0.0006)
[2025-03-28 18:14:15,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 26828.8, 300 sec: 26381.0). Total num frames: 9596928. Throughput: 0: 7091.2. Samples: 2400370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:14:15,750][2713170] Avg episode reward: [(0, '20.018')]
[2025-03-28 18:14:16,719][2761575] Updated weights for policy 0, policy_version 2350 (0.0006)
[2025-03-28 18:14:18,155][2761575] Updated weights for policy 0, policy_version 2360 (0.0006)
[2025-03-28 18:14:19,710][2761575] Updated weights for policy 0, policy_version 2370 (0.0006)
[2025-03-28 18:14:20,749][2713170] Fps is (10 sec: 27443.5, 60 sec: 26965.4, 300 sec: 26367.1). Total num frames: 9736192. Throughput: 0: 6782.9. Samples: 2421028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:14:20,750][2713170] Avg episode reward: [(0, '23.661')]
[2025-03-28 18:14:21,185][2761575] Updated weights for policy 0, policy_version 2380 (0.0006)
[2025-03-28 18:14:22,759][2761575] Updated weights for policy 0, policy_version 2390 (0.0006)
[2025-03-28 18:14:24,221][2761575] Updated weights for policy 0, policy_version 2400 (0.0006)
[2025-03-28 18:14:25,551][2761575] Updated weights for policy 0, policy_version 2410 (0.0006)
[2025-03-28 18:14:25,749][2713170] Fps is (10 sec: 27853.1, 60 sec: 26897.0, 300 sec: 26422.7). Total num frames: 9875456. Throughput: 0: 6755.1. Samples: 2461968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:14:25,750][2713170] Avg episode reward: [(0, '24.863')]
[2025-03-28 18:14:26,917][2761575] Updated weights for policy 0, policy_version 2420 (0.0006)
[2025-03-28 18:14:28,389][2761575] Updated weights for policy 0, policy_version 2430 (0.0006)
[2025-03-28 18:14:29,879][2761575] Updated weights for policy 0, policy_version 2440 (0.0006)
[2025-03-28 18:14:30,749][2713170] Fps is (10 sec: 26214.2, 60 sec: 26828.8, 300 sec: 26394.9). Total num frames: 9998336. Throughput: 0: 6324.1. Samples: 2484310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:14:30,750][2713170] Avg episode reward: [(0, '25.162')]
[2025-03-28 18:14:32,585][2761575] Updated weights for policy 0, policy_version 2450 (0.0008)
[2025-03-28 18:14:34,124][2761575] Updated weights for policy 0, policy_version 2460 (0.0006)
[2025-03-28 18:14:35,621][2761575] Updated weights for policy 0, policy_version 2470 (0.0006)
[2025-03-28 18:14:35,749][2713170] Fps is (10 sec: 24166.1, 60 sec: 26692.2, 300 sec: 26339.4). Total num frames: 10117120. Throughput: 0: 6577.6. Samples: 2517086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-28 18:14:35,750][2713170] Avg episode reward: [(0, '24.141')]
[2025-03-28 18:14:37,120][2761575] Updated weights for policy 0, policy_version 2480 (0.0007)
[2025-03-28 18:14:38,635][2761575] Updated weights for policy 0, policy_version 2490 (0.0006)
[2025-03-28 18:14:40,105][2761575] Updated weights for policy 0, policy_version 2500 (0.0006)
[2025-03-28 18:14:40,749][2713170] Fps is (10 sec: 25804.7, 60 sec: 26692.2, 300 sec: 26367.1). Total num frames: 10256384. Throughput: 0: 6539.5. Samples: 2558034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:14:40,750][2713170] Avg episode reward: [(0, '25.604')]
[2025-03-28 18:14:40,752][2761553] Saving new best policy, reward=25.604!
[2025-03-28 18:14:41,632][2761575] Updated weights for policy 0, policy_version 2510 (0.0006)
[2025-03-28 18:14:42,985][2761575] Updated weights for policy 0, policy_version 2520 (0.0006)
[2025-03-28 18:14:44,513][2761575] Updated weights for policy 0, policy_version 2530 (0.0006)
[2025-03-28 18:14:45,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 26624.0, 300 sec: 26381.0). Total num frames: 10391552. Throughput: 0: 6995.8. Samples: 2599552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:14:45,750][2713170] Avg episode reward: [(0, '24.231')]
[2025-03-28 18:14:46,063][2761575] Updated weights for policy 0, policy_version 2540 (0.0006)
[2025-03-28 18:14:47,590][2761575] Updated weights for policy 0, policy_version 2550 (0.0006)
[2025-03-28 18:14:49,112][2761575] Updated weights for policy 0, policy_version 2560 (0.0006)
[2025-03-28 18:14:50,602][2761575] Updated weights for policy 0, policy_version 2570 (0.0006)
[2025-03-28 18:14:50,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 26487.4, 300 sec: 26381.0). Total num frames: 10526720. Throughput: 0: 6705.0. Samples: 2619642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:14:50,750][2713170] Avg episode reward: [(0, '25.343')]
[2025-03-28 18:14:52,104][2761575] Updated weights for policy 0, policy_version 2580 (0.0006)
[2025-03-28 18:14:53,578][2761575] Updated weights for policy 0, policy_version 2590 (0.0006)
[2025-03-28 18:14:55,157][2761575] Updated weights for policy 0, policy_version 2600 (0.0006)
[2025-03-28 18:14:55,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 26419.2, 300 sec: 26450.4). Total num frames: 10665984. Throughput: 0: 6680.1. Samples: 2660378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:14:55,750][2713170] Avg episode reward: [(0, '25.284')]
[2025-03-28 18:14:56,629][2761575] Updated weights for policy 0, policy_version 2610 (0.0006)
[2025-03-28 18:14:58,201][2761575] Updated weights for policy 0, policy_version 2620 (0.0006)
[2025-03-28 18:14:59,737][2761575] Updated weights for policy 0, policy_version 2630 (0.0006)
[2025-03-28 18:15:00,749][2713170] Fps is (10 sec: 25395.1, 60 sec: 26351.0, 300 sec: 26367.1). Total num frames: 10780672. Throughput: 0: 6222.7. Samples: 2680392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:15:00,751][2713170] Avg episode reward: [(0, '22.368')]
[2025-03-28 18:15:02,451][2761575] Updated weights for policy 0, policy_version 2640 (0.0008)
[2025-03-28 18:15:03,935][2761575] Updated weights for policy 0, policy_version 2650 (0.0006)
[2025-03-28 18:15:05,302][2761575] Updated weights for policy 0, policy_version 2660 (0.0006)
[2025-03-28 18:15:05,749][2713170] Fps is (10 sec: 23756.8, 60 sec: 26350.9, 300 sec: 26381.0). Total num frames: 10903552. Throughput: 0: 6486.5. Samples: 2712922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:15:05,750][2713170] Avg episode reward: [(0, '26.731')]
[2025-03-28 18:15:05,773][2761553] Saving new best policy, reward=26.731!
[2025-03-28 18:15:06,803][2761575] Updated weights for policy 0, policy_version 2670 (0.0006)
[2025-03-28 18:15:08,239][2761575] Updated weights for policy 0, policy_version 2680 (0.0006)
[2025-03-28 18:15:09,657][2761575] Updated weights for policy 0, policy_version 2690 (0.0006)
[2025-03-28 18:15:10,749][2713170] Fps is (10 sec: 26624.0, 60 sec: 26419.2, 300 sec: 26492.1). Total num frames: 11046912. Throughput: 0: 6531.6. Samples: 2755890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:15:10,751][2713170] Avg episode reward: [(0, '27.009')]
[2025-03-28 18:15:10,767][2761553] Saving new best policy, reward=27.009!
[2025-03-28 18:15:11,080][2761575] Updated weights for policy 0, policy_version 2700 (0.0006)
[2025-03-28 18:15:12,574][2761575] Updated weights for policy 0, policy_version 2710 (0.0006)
[2025-03-28 18:15:14,037][2761575] Updated weights for policy 0, policy_version 2720 (0.0006)
[2025-03-28 18:15:15,360][2761575] Updated weights for policy 0, policy_version 2730 (0.0006)
[2025-03-28 18:15:15,749][2713170] Fps is (10 sec: 28672.0, 60 sec: 26555.7, 300 sec: 26519.9). Total num frames: 11190272. Throughput: 0: 6990.3. Samples: 2798872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:15:15,750][2713170] Avg episode reward: [(0, '25.415')]
[2025-03-28 18:15:16,862][2761575] Updated weights for policy 0, policy_version 2740 (0.0006)
[2025-03-28 18:15:18,246][2761575] Updated weights for policy 0, policy_version 2750 (0.0006)
[2025-03-28 18:15:19,774][2761575] Updated weights for policy 0, policy_version 2760 (0.0006)
[2025-03-28 18:15:20,749][2713170] Fps is (10 sec: 28262.5, 60 sec: 26555.7, 300 sec: 26547.6). Total num frames: 11329536. Throughput: 0: 6735.9. Samples: 2820200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-03-28 18:15:20,750][2713170] Avg episode reward: [(0, '24.524')]
[2025-03-28 18:15:21,229][2761575] Updated weights for policy 0, policy_version 2770 (0.0006)
[2025-03-28 18:15:22,769][2761575] Updated weights for policy 0, policy_version 2780 (0.0006)
[2025-03-28 18:15:24,212][2761575] Updated weights for policy 0, policy_version 2790 (0.0006)
[2025-03-28 18:15:25,687][2761575] Updated weights for policy 0, policy_version 2800 (0.0006)
[2025-03-28 18:15:25,749][2713170] Fps is (10 sec: 27852.8, 60 sec: 26555.7, 300 sec: 26603.2). Total num frames: 11468800. Throughput: 0: 6734.6. Samples: 2861092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:15:25,750][2713170] Avg episode reward: [(0, '23.794')]
[2025-03-28 18:15:27,128][2761575] Updated weights for policy 0, policy_version 2810 (0.0006)
[2025-03-28 18:15:28,658][2761575] Updated weights for policy 0, policy_version 2820 (0.0007)
[2025-03-28 18:15:30,115][2761575] Updated weights for policy 0, policy_version 2830 (0.0006)
[2025-03-28 18:15:30,749][2713170] Fps is (10 sec: 27852.8, 60 sec: 26828.8, 300 sec: 26630.9). Total num frames: 11608064. Throughput: 0: 6735.0. Samples: 2902628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:15:30,750][2713170] Avg episode reward: [(0, '26.945')]
[2025-03-28 18:15:31,660][2761575] Updated weights for policy 0, policy_version 2840 (0.0006)
[2025-03-28 18:15:33,175][2761575] Updated weights for policy 0, policy_version 2850 (0.0006)
[2025-03-28 18:15:34,678][2761575] Updated weights for policy 0, policy_version 2860 (0.0006)
[2025-03-28 18:15:35,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 27033.6, 300 sec: 26617.0). Total num frames: 11739136. Throughput: 0: 6735.7. Samples: 2922748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:15:35,750][2713170] Avg episode reward: [(0, '26.052')]
[2025-03-28 18:15:36,210][2761575] Updated weights for policy 0, policy_version 2870 (0.0006)
[2025-03-28 18:15:37,696][2761575] Updated weights for policy 0, policy_version 2880 (0.0006)
[2025-03-28 18:15:39,233][2761575] Updated weights for policy 0, policy_version 2890 (0.0006)
[2025-03-28 18:15:40,721][2761575] Updated weights for policy 0, policy_version 2900 (0.0006)
[2025-03-28 18:15:40,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 27033.6, 300 sec: 26644.8). Total num frames: 11878400. Throughput: 0: 6736.8. Samples: 2963532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-03-28 18:15:40,750][2713170] Avg episode reward: [(0, '24.499')]
[2025-03-28 18:15:42,186][2761575] Updated weights for policy 0, policy_version 2910 (0.0006)
[2025-03-28 18:15:43,692][2761575] Updated weights for policy 0, policy_version 2920 (0.0006)
[2025-03-28 18:15:45,231][2761575] Updated weights for policy 0, policy_version 2930 (0.0006)
[2025-03-28 18:15:45,749][2713170] Fps is (10 sec: 27443.0, 60 sec: 27033.6, 300 sec: 26644.8). Total num frames: 12013568. Throughput: 0: 7199.3. Samples: 3004362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:15:45,751][2713170] Avg episode reward: [(0, '26.088')]
[2025-03-28 18:15:46,752][2761575] Updated weights for policy 0, policy_version 2940 (0.0006)
[2025-03-28 18:15:48,292][2761575] Updated weights for policy 0, policy_version 2950 (0.0006)
[2025-03-28 18:15:49,786][2761575] Updated weights for policy 0, policy_version 2960 (0.0006)
[2025-03-28 18:15:50,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 27033.6, 300 sec: 26644.8). Total num frames: 12148736. Throughput: 0: 6925.6. Samples: 3024574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:15:50,751][2713170] Avg episode reward: [(0, '25.095')]
[2025-03-28 18:15:51,213][2761575] Updated weights for policy 0, policy_version 2970 (0.0006)
[2025-03-28 18:15:52,807][2761575] Updated weights for policy 0, policy_version 2980 (0.0006)
[2025-03-28 18:15:54,287][2761575] Updated weights for policy 0, policy_version 2990 (0.0006)
[2025-03-28 18:15:55,749][2713170] Fps is (10 sec: 27033.7, 60 sec: 26965.3, 300 sec: 26700.4). Total num frames: 12283904. Throughput: 0: 6880.2. Samples: 3065498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:15:55,750][2713170] Avg episode reward: [(0, '25.314')]
[2025-03-28 18:15:55,757][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002999_12283904.pth...
[2025-03-28 18:15:55,866][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001439_5894144.pth
[2025-03-28 18:15:55,886][2761575] Updated weights for policy 0, policy_version 3000 (0.0006)
[2025-03-28 18:15:57,349][2761575] Updated weights for policy 0, policy_version 3010 (0.0006)
[2025-03-28 18:15:58,817][2761575] Updated weights for policy 0, policy_version 3020 (0.0006)
[2025-03-28 18:16:00,313][2761575] Updated weights for policy 0, policy_version 3030 (0.0006)
[2025-03-28 18:16:00,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 27374.9, 300 sec: 26728.1). Total num frames: 12423168. Throughput: 0: 6836.8. Samples: 3106526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:00,750][2713170] Avg episode reward: [(0, '25.221')]
[2025-03-28 18:16:01,761][2761575] Updated weights for policy 0, policy_version 3040 (0.0006)
[2025-03-28 18:16:03,274][2761575] Updated weights for policy 0, policy_version 3050 (0.0006)
[2025-03-28 18:16:04,795][2761575] Updated weights for policy 0, policy_version 3060 (0.0006)
[2025-03-28 18:16:05,749][2713170] Fps is (10 sec: 27443.4, 60 sec: 27579.7, 300 sec: 26728.1). Total num frames: 12558336. Throughput: 0: 6814.5. Samples: 3126852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:05,750][2713170] Avg episode reward: [(0, '27.344')]
[2025-03-28 18:16:05,755][2761553] Saving new best policy, reward=27.344!
[2025-03-28 18:16:06,259][2761575] Updated weights for policy 0, policy_version 3070 (0.0007)
[2025-03-28 18:16:07,628][2761575] Updated weights for policy 0, policy_version 3080 (0.0006)
[2025-03-28 18:16:09,155][2761575] Updated weights for policy 0, policy_version 3090 (0.0006)
[2025-03-28 18:16:10,636][2761575] Updated weights for policy 0, policy_version 3100 (0.0006)
[2025-03-28 18:16:10,749][2713170] Fps is (10 sec: 27443.3, 60 sec: 27511.5, 300 sec: 26714.2). Total num frames: 12697600. Throughput: 0: 6843.6. Samples: 3169056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:16:10,750][2713170] Avg episode reward: [(0, '25.365')]
[2025-03-28 18:16:12,131][2761575] Updated weights for policy 0, policy_version 3110 (0.0006)
[2025-03-28 18:16:13,575][2761575] Updated weights for policy 0, policy_version 3120 (0.0006)
[2025-03-28 18:16:14,983][2761575] Updated weights for policy 0, policy_version 3130 (0.0006)
[2025-03-28 18:16:15,749][2713170] Fps is (10 sec: 28262.1, 60 sec: 27511.4, 300 sec: 26742.0). Total num frames: 12840960. Throughput: 0: 6868.2. Samples: 3211696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:15,752][2713170] Avg episode reward: [(0, '23.905')]
[2025-03-28 18:16:16,381][2761575] Updated weights for policy 0, policy_version 3140 (0.0006)
[2025-03-28 18:16:17,845][2761575] Updated weights for policy 0, policy_version 3150 (0.0006)
[2025-03-28 18:16:19,329][2761575] Updated weights for policy 0, policy_version 3160 (0.0006)
[2025-03-28 18:16:20,749][2713170] Fps is (10 sec: 28262.4, 60 sec: 27511.5, 300 sec: 26755.9). Total num frames: 12980224. Throughput: 0: 6885.0. Samples: 3232574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:20,750][2713170] Avg episode reward: [(0, '27.121')]
[2025-03-28 18:16:20,813][2761575] Updated weights for policy 0, policy_version 3170 (0.0006)
[2025-03-28 18:16:22,327][2761575] Updated weights for policy 0, policy_version 3180 (0.0006)
[2025-03-28 18:16:23,849][2761575] Updated weights for policy 0, policy_version 3190 (0.0006)
[2025-03-28 18:16:25,314][2761575] Updated weights for policy 0, policy_version 3200 (0.0006)
[2025-03-28 18:16:25,749][2713170] Fps is (10 sec: 27853.1, 60 sec: 27511.5, 300 sec: 26825.3). Total num frames: 13119488. Throughput: 0: 6886.4. Samples: 3273418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:25,750][2713170] Avg episode reward: [(0, '24.838')]
[2025-03-28 18:16:26,604][2761575] Updated weights for policy 0, policy_version 3210 (0.0006)
[2025-03-28 18:16:28,120][2761575] Updated weights for policy 0, policy_version 3220 (0.0006)
[2025-03-28 18:16:29,514][2761575] Updated weights for policy 0, policy_version 3230 (0.0006)
[2025-03-28 18:16:30,749][2713170] Fps is (10 sec: 28262.4, 60 sec: 27579.7, 300 sec: 26867.0). Total num frames: 13262848. Throughput: 0: 6941.4. Samples: 3316724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:30,750][2713170] Avg episode reward: [(0, '23.754')]
[2025-03-28 18:16:30,980][2761575] Updated weights for policy 0, policy_version 3240 (0.0007)
[2025-03-28 18:16:32,367][2761575] Updated weights for policy 0, policy_version 3250 (0.0006)
[2025-03-28 18:16:33,822][2761575] Updated weights for policy 0, policy_version 3260 (0.0006)
[2025-03-28 18:16:35,145][2761575] Updated weights for policy 0, policy_version 3270 (0.0006)
[2025-03-28 18:16:35,749][2713170] Fps is (10 sec: 29081.8, 60 sec: 27852.8, 300 sec: 26908.6). Total num frames: 13410304. Throughput: 0: 6971.9. Samples: 3338308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-28 18:16:35,750][2713170] Avg episode reward: [(0, '29.695')]
[2025-03-28 18:16:35,753][2761553] Saving new best policy, reward=29.695!
[2025-03-28 18:16:36,567][2761575] Updated weights for policy 0, policy_version 3280 (0.0006)
[2025-03-28 18:16:38,027][2761575] Updated weights for policy 0, policy_version 3290 (0.0006)
[2025-03-28 18:16:39,514][2761575] Updated weights for policy 0, policy_version 3300 (0.0006)
[2025-03-28 18:16:40,749][2713170] Fps is (10 sec: 28672.2, 60 sec: 27852.8, 300 sec: 26908.6). Total num frames: 13549568. Throughput: 0: 7021.8. Samples: 3381476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:16:40,750][2713170] Avg episode reward: [(0, '25.651')]
[2025-03-28 18:16:40,957][2761575] Updated weights for policy 0, policy_version 3310 (0.0007)
[2025-03-28 18:16:42,424][2761575] Updated weights for policy 0, policy_version 3320 (0.0006)
[2025-03-28 18:16:43,873][2761575] Updated weights for policy 0, policy_version 3330 (0.0006)
[2025-03-28 18:16:45,328][2761575] Updated weights for policy 0, policy_version 3340 (0.0007)
[2025-03-28 18:16:45,749][2713170] Fps is (10 sec: 27852.9, 60 sec: 27921.2, 300 sec: 26908.6). Total num frames: 13688832. Throughput: 0: 7049.4. Samples: 3423748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:45,750][2713170] Avg episode reward: [(0, '29.948')]
[2025-03-28 18:16:45,753][2761553] Saving new best policy, reward=29.948!
[2025-03-28 18:16:46,803][2761575] Updated weights for policy 0, policy_version 3350 (0.0006)
[2025-03-28 18:16:48,250][2761575] Updated weights for policy 0, policy_version 3360 (0.0007)
[2025-03-28 18:16:49,685][2761575] Updated weights for policy 0, policy_version 3370 (0.0006)
[2025-03-28 18:16:50,749][2713170] Fps is (10 sec: 28262.2, 60 sec: 28057.6, 300 sec: 26908.6). Total num frames: 13832192. Throughput: 0: 7068.4. Samples: 3444930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:16:50,750][2713170] Avg episode reward: [(0, '25.852')]
[2025-03-28 18:16:51,036][2761575] Updated weights for policy 0, policy_version 3380 (0.0006)
[2025-03-28 18:16:52,457][2761575] Updated weights for policy 0, policy_version 3390 (0.0007)
[2025-03-28 18:16:53,965][2761575] Updated weights for policy 0, policy_version 3400 (0.0007)
[2025-03-28 18:16:55,414][2761575] Updated weights for policy 0, policy_version 3410 (0.0007)
[2025-03-28 18:16:55,749][2713170] Fps is (10 sec: 28671.6, 60 sec: 28194.2, 300 sec: 26991.9). Total num frames: 13975552. Throughput: 0: 7082.7. Samples: 3487778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:16:55,750][2713170] Avg episode reward: [(0, '24.025')]
[2025-03-28 18:16:56,833][2761575] Updated weights for policy 0, policy_version 3420 (0.0007)
[2025-03-28 18:16:58,327][2761575] Updated weights for policy 0, policy_version 3430 (0.0006)
[2025-03-28 18:16:59,818][2761575] Updated weights for policy 0, policy_version 3440 (0.0006)
[2025-03-28 18:17:00,749][2713170] Fps is (10 sec: 28262.5, 60 sec: 28194.2, 300 sec: 27019.7). Total num frames: 14114816. Throughput: 0: 7063.4. Samples: 3529550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:17:00,750][2713170] Avg episode reward: [(0, '24.609')]
[2025-03-28 18:17:01,296][2761575] Updated weights for policy 0, policy_version 3450 (0.0006)
[2025-03-28 18:17:02,727][2761575] Updated weights for policy 0, policy_version 3460 (0.0006)
[2025-03-28 18:17:04,201][2761575] Updated weights for policy 0, policy_version 3470 (0.0006)
[2025-03-28 18:17:05,645][2761575] Updated weights for policy 0, policy_version 3480 (0.0006)
[2025-03-28 18:17:05,749][2713170] Fps is (10 sec: 27852.9, 60 sec: 28262.4, 300 sec: 27019.7). Total num frames: 14254080. Throughput: 0: 7069.8. Samples: 3550716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:17:05,750][2713170] Avg episode reward: [(0, '30.324')]
[2025-03-28 18:17:05,757][2761553] Saving new best policy, reward=30.324!
[2025-03-28 18:17:07,144][2761575] Updated weights for policy 0, policy_version 3490 (0.0007)
[2025-03-28 18:17:08,662][2761575] Updated weights for policy 0, policy_version 3500 (0.0007)
[2025-03-28 18:17:10,149][2761575] Updated weights for policy 0, policy_version 3510 (0.0007)
[2025-03-28 18:17:10,749][2713170] Fps is (10 sec: 26214.6, 60 sec: 27989.4, 300 sec: 26964.2). Total num frames: 14376960. Throughput: 0: 7082.8. Samples: 3592142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:10,749][2713170] Avg episode reward: [(0, '23.140')]
[2025-03-28 18:17:12,128][2761575] Updated weights for policy 0, policy_version 3520 (0.0006)
[2025-03-28 18:17:13,651][2761575] Updated weights for policy 0, policy_version 3530 (0.0006)
[2025-03-28 18:17:15,108][2761575] Updated weights for policy 0, policy_version 3540 (0.0006)
[2025-03-28 18:17:15,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 27921.1, 300 sec: 26964.2). Total num frames: 14516224. Throughput: 0: 6958.9. Samples: 3629876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:15,751][2713170] Avg episode reward: [(0, '27.505')]
[2025-03-28 18:17:16,608][2761575] Updated weights for policy 0, policy_version 3550 (0.0007)
[2025-03-28 18:17:18,048][2761575] Updated weights for policy 0, policy_version 3560 (0.0007)
[2025-03-28 18:17:19,369][2761575] Updated weights for policy 0, policy_version 3570 (0.0006)
[2025-03-28 18:17:20,749][2713170] Fps is (10 sec: 28262.1, 60 sec: 27989.3, 300 sec: 26950.3). Total num frames: 14659584. Throughput: 0: 6960.4. Samples: 3651528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:20,751][2713170] Avg episode reward: [(0, '29.310')]
[2025-03-28 18:17:20,791][2761575] Updated weights for policy 0, policy_version 3580 (0.0006)
[2025-03-28 18:17:22,289][2761575] Updated weights for policy 0, policy_version 3590 (0.0006)
[2025-03-28 18:17:23,740][2761575] Updated weights for policy 0, policy_version 3600 (0.0007)
[2025-03-28 18:17:25,167][2761575] Updated weights for policy 0, policy_version 3610 (0.0006)
[2025-03-28 18:17:25,749][2713170] Fps is (10 sec: 28672.0, 60 sec: 28057.6, 300 sec: 27019.7). Total num frames: 14802944. Throughput: 0: 6950.7. Samples: 3694256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:25,750][2713170] Avg episode reward: [(0, '28.705')]
[2025-03-28 18:17:26,530][2761575] Updated weights for policy 0, policy_version 3620 (0.0006)
[2025-03-28 18:17:27,910][2761575] Updated weights for policy 0, policy_version 3630 (0.0006)
[2025-03-28 18:17:29,310][2761575] Updated weights for policy 0, policy_version 3640 (0.0006)
[2025-03-28 18:17:30,711][2761575] Updated weights for policy 0, policy_version 3650 (0.0006)
[2025-03-28 18:17:30,749][2713170] Fps is (10 sec: 29081.6, 60 sec: 28125.9, 300 sec: 27089.1). Total num frames: 14950400. Throughput: 0: 6996.1. Samples: 3738574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:17:30,750][2713170] Avg episode reward: [(0, '27.222')]
[2025-03-28 18:17:32,132][2761575] Updated weights for policy 0, policy_version 3660 (0.0006)
[2025-03-28 18:17:33,589][2761575] Updated weights for policy 0, policy_version 3670 (0.0006)
[2025-03-28 18:17:35,105][2761575] Updated weights for policy 0, policy_version 3680 (0.0007)
[2025-03-28 18:17:35,749][2713170] Fps is (10 sec: 28671.9, 60 sec: 27989.3, 300 sec: 27089.1). Total num frames: 15089664. Throughput: 0: 6997.4. Samples: 3759812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:35,751][2713170] Avg episode reward: [(0, '28.245')]
[2025-03-28 18:17:36,637][2761575] Updated weights for policy 0, policy_version 3690 (0.0006)
[2025-03-28 18:17:38,191][2761575] Updated weights for policy 0, policy_version 3700 (0.0007)
[2025-03-28 18:17:39,703][2761575] Updated weights for policy 0, policy_version 3710 (0.0007)
[2025-03-28 18:17:40,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 27921.0, 300 sec: 27116.9). Total num frames: 15224832. Throughput: 0: 6938.4. Samples: 3800006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:40,750][2713170] Avg episode reward: [(0, '30.611')]
[2025-03-28 18:17:40,752][2761553] Saving new best policy, reward=30.611!
[2025-03-28 18:17:41,157][2761575] Updated weights for policy 0, policy_version 3720 (0.0006)
[2025-03-28 18:17:42,693][2761575] Updated weights for policy 0, policy_version 3730 (0.0006)
[2025-03-28 18:17:44,161][2761575] Updated weights for policy 0, policy_version 3740 (0.0006)
[2025-03-28 18:17:45,710][2761575] Updated weights for policy 0, policy_version 3750 (0.0006)
[2025-03-28 18:17:45,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 27852.7, 300 sec: 27130.8). Total num frames: 15360000. Throughput: 0: 6917.6. Samples: 3840844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:17:45,750][2713170] Avg episode reward: [(0, '27.316')]
[2025-03-28 18:17:47,229][2761575] Updated weights for policy 0, policy_version 3760 (0.0006)
[2025-03-28 18:17:48,757][2761575] Updated weights for policy 0, policy_version 3770 (0.0006)
[2025-03-28 18:17:50,309][2761575] Updated weights for policy 0, policy_version 3780 (0.0006)
[2025-03-28 18:17:50,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 27716.3, 300 sec: 27144.7). Total num frames: 15495168. Throughput: 0: 6894.2. Samples: 3860954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:17:50,750][2713170] Avg episode reward: [(0, '24.025')]
[2025-03-28 18:17:51,869][2761575] Updated weights for policy 0, policy_version 3790 (0.0006)
[2025-03-28 18:17:53,413][2761575] Updated weights for policy 0, policy_version 3800 (0.0006)
[2025-03-28 18:17:54,896][2761575] Updated weights for policy 0, policy_version 3810 (0.0006)
[2025-03-28 18:17:55,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 27511.4, 300 sec: 27214.1). Total num frames: 15626240. Throughput: 0: 6866.6. Samples: 3901140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:17:55,750][2713170] Avg episode reward: [(0, '27.859')]
[2025-03-28 18:17:55,757][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003815_15626240.pth...
[2025-03-28 18:17:55,861][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002217_9080832.pth
[2025-03-28 18:17:56,538][2761575] Updated weights for policy 0, policy_version 3820 (0.0007)
[2025-03-28 18:17:58,075][2761575] Updated weights for policy 0, policy_version 3830 (0.0007)
[2025-03-28 18:17:59,667][2761575] Updated weights for policy 0, policy_version 3840 (0.0007)
[2025-03-28 18:18:00,749][2713170] Fps is (10 sec: 25804.8, 60 sec: 27306.6, 300 sec: 27241.9). Total num frames: 15753216. Throughput: 0: 6866.9. Samples: 3938886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:18:00,750][2713170] Avg episode reward: [(0, '28.161')]
[2025-03-28 18:18:01,340][2761575] Updated weights for policy 0, policy_version 3850 (0.0007)
[2025-03-28 18:18:02,968][2761575] Updated weights for policy 0, policy_version 3860 (0.0006)
[2025-03-28 18:18:04,651][2761575] Updated weights for policy 0, policy_version 3870 (0.0007)
[2025-03-28 18:18:05,749][2713170] Fps is (10 sec: 24985.7, 60 sec: 27033.6, 300 sec: 27214.1). Total num frames: 15876096. Throughput: 0: 6808.7. Samples: 3957918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:18:05,751][2713170] Avg episode reward: [(0, '26.752')]
[2025-03-28 18:18:06,272][2761575] Updated weights for policy 0, policy_version 3880 (0.0007)
[2025-03-28 18:18:07,847][2761575] Updated weights for policy 0, policy_version 3890 (0.0007)
[2025-03-28 18:18:09,400][2761575] Updated weights for policy 0, policy_version 3900 (0.0007)
[2025-03-28 18:18:10,749][2713170] Fps is (10 sec: 24166.6, 60 sec: 26965.3, 300 sec: 27144.7). Total num frames: 15994880. Throughput: 0: 6714.1. Samples: 3996392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:18:10,750][2713170] Avg episode reward: [(0, '28.720')]
[2025-03-28 18:18:11,964][2761575] Updated weights for policy 0, policy_version 3910 (0.0007)
[2025-03-28 18:18:13,441][2761575] Updated weights for policy 0, policy_version 3920 (0.0007)
[2025-03-28 18:18:14,850][2761575] Updated weights for policy 0, policy_version 3930 (0.0006)
[2025-03-28 18:18:15,749][2713170] Fps is (10 sec: 24166.4, 60 sec: 26692.2, 300 sec: 27116.9). Total num frames: 16117760. Throughput: 0: 6498.7. Samples: 4031014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:18:15,751][2713170] Avg episode reward: [(0, '26.314')]
[2025-03-28 18:18:16,341][2761575] Updated weights for policy 0, policy_version 3940 (0.0006)
[2025-03-28 18:18:17,708][2761575] Updated weights for policy 0, policy_version 3950 (0.0006)
[2025-03-28 18:18:18,992][2761575] Updated weights for policy 0, policy_version 3960 (0.0006)
[2025-03-28 18:18:20,331][2761575] Updated weights for policy 0, policy_version 3970 (0.0007)
[2025-03-28 18:18:20,749][2713170] Fps is (10 sec: 27442.9, 60 sec: 26828.8, 300 sec: 27144.7). Total num frames: 16269312. Throughput: 0: 6533.2. Samples: 4053804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:18:20,751][2713170] Avg episode reward: [(0, '25.039')]
[2025-03-28 18:18:21,876][2761575] Updated weights for policy 0, policy_version 3980 (0.0006)
[2025-03-28 18:18:23,401][2761575] Updated weights for policy 0, policy_version 3990 (0.0006)
[2025-03-28 18:18:24,797][2761575] Updated weights for policy 0, policy_version 4000 (0.0006)
[2025-03-28 18:18:25,749][2713170] Fps is (10 sec: 29081.7, 60 sec: 26760.5, 300 sec: 27186.3). Total num frames: 16408576. Throughput: 0: 6586.3. Samples: 4096388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:18:25,751][2713170] Avg episode reward: [(0, '26.266')]
[2025-03-28 18:18:26,277][2761575] Updated weights for policy 0, policy_version 4010 (0.0006)
[2025-03-28 18:18:27,871][2761575] Updated weights for policy 0, policy_version 4020 (0.0006)
[2025-03-28 18:18:29,417][2761575] Updated weights for policy 0, policy_version 4030 (0.0007)
[2025-03-28 18:18:30,749][2713170] Fps is (10 sec: 26624.2, 60 sec: 26419.2, 300 sec: 27186.3). Total num frames: 16535552. Throughput: 0: 6526.5. Samples: 4134534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:18:30,750][2713170] Avg episode reward: [(0, '25.414')]
[2025-03-28 18:18:31,171][2761575] Updated weights for policy 0, policy_version 4040 (0.0008)
[2025-03-28 18:18:32,693][2761575] Updated weights for policy 0, policy_version 4050 (0.0007)
[2025-03-28 18:18:34,165][2761575] Updated weights for policy 0, policy_version 4060 (0.0006)
[2025-03-28 18:18:35,669][2761575] Updated weights for policy 0, policy_version 4070 (0.0006)
[2025-03-28 18:18:35,749][2713170] Fps is (10 sec: 26214.3, 60 sec: 26350.9, 300 sec: 27172.4). Total num frames: 16670720. Throughput: 0: 6537.7. Samples: 4155152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:18:35,750][2713170] Avg episode reward: [(0, '26.396')]
[2025-03-28 18:18:37,201][2761575] Updated weights for policy 0, policy_version 4080 (0.0006)
[2025-03-28 18:18:38,743][2761575] Updated weights for policy 0, policy_version 4090 (0.0006)
[2025-03-28 18:18:40,314][2761575] Updated weights for policy 0, policy_version 4100 (0.0006)
[2025-03-28 18:18:40,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 26282.7, 300 sec: 27144.7). Total num frames: 16801792. Throughput: 0: 6534.5. Samples: 4195190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:18:40,750][2713170] Avg episode reward: [(0, '27.750')]
[2025-03-28 18:18:41,874][2761575] Updated weights for policy 0, policy_version 4110 (0.0006)
[2025-03-28 18:18:43,477][2761575] Updated weights for policy 0, policy_version 4120 (0.0006)
[2025-03-28 18:18:45,123][2761575] Updated weights for policy 0, policy_version 4130 (0.0006)
[2025-03-28 18:18:45,749][2713170] Fps is (10 sec: 25804.6, 60 sec: 26146.1, 300 sec: 27089.1). Total num frames: 16928768. Throughput: 0: 6553.1. Samples: 4233776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:18:45,750][2713170] Avg episode reward: [(0, '26.994')]
[2025-03-28 18:18:46,696][2761575] Updated weights for policy 0, policy_version 4140 (0.0007)
[2025-03-28 18:18:48,385][2761575] Updated weights for policy 0, policy_version 4150 (0.0007)
[2025-03-28 18:18:50,041][2761575] Updated weights for policy 0, policy_version 4160 (0.0006)
[2025-03-28 18:18:50,749][2713170] Fps is (10 sec: 25395.2, 60 sec: 26009.6, 300 sec: 27033.6). Total num frames: 17055744. Throughput: 0: 6546.2. Samples: 4252496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:18:50,750][2713170] Avg episode reward: [(0, '27.719')]
[2025-03-28 18:18:51,663][2761575] Updated weights for policy 0, policy_version 4170 (0.0006)
[2025-03-28 18:18:53,235][2761575] Updated weights for policy 0, policy_version 4180 (0.0006)
[2025-03-28 18:18:54,839][2761575] Updated weights for policy 0, policy_version 4190 (0.0006)
[2025-03-28 18:18:55,749][2713170] Fps is (10 sec: 25395.5, 60 sec: 25941.4, 300 sec: 27061.4). Total num frames: 17182720. Throughput: 0: 6537.7. Samples: 4290588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:18:55,750][2713170] Avg episode reward: [(0, '28.506')]
[2025-03-28 18:18:56,455][2761575] Updated weights for policy 0, policy_version 4200 (0.0006)
[2025-03-28 18:18:57,990][2761575] Updated weights for policy 0, policy_version 4210 (0.0006)
[2025-03-28 18:18:59,561][2761575] Updated weights for policy 0, policy_version 4220 (0.0006)
[2025-03-28 18:19:00,749][2713170] Fps is (10 sec: 25395.4, 60 sec: 25941.3, 300 sec: 27075.3). Total num frames: 17309696. Throughput: 0: 6593.1. Samples: 4327702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-28 18:19:00,750][2713170] Avg episode reward: [(0, '30.917')]
[2025-03-28 18:19:00,752][2761553] Saving new best policy, reward=30.917!
[2025-03-28 18:19:01,397][2761575] Updated weights for policy 0, policy_version 4230 (0.0009)
[2025-03-28 18:19:02,841][2761575] Updated weights for policy 0, policy_version 4240 (0.0007)
[2025-03-28 18:19:04,253][2761575] Updated weights for policy 0, policy_version 4250 (0.0006)
[2025-03-28 18:19:05,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 26146.2, 300 sec: 27061.4). Total num frames: 17444864. Throughput: 0: 6556.7. Samples: 4348854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:19:05,751][2713170] Avg episode reward: [(0, '26.407')]
[2025-03-28 18:19:05,798][2761575] Updated weights for policy 0, policy_version 4260 (0.0006)
[2025-03-28 18:19:07,289][2761575] Updated weights for policy 0, policy_version 4270 (0.0007)
[2025-03-28 18:19:08,852][2761575] Updated weights for policy 0, policy_version 4280 (0.0006)
[2025-03-28 18:19:10,749][2713170] Fps is (10 sec: 25804.8, 60 sec: 26214.4, 300 sec: 27019.7). Total num frames: 17567744. Throughput: 0: 6504.4. Samples: 4389084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:10,750][2713170] Avg episode reward: [(0, '27.930')]
[2025-03-28 18:19:11,644][2761575] Updated weights for policy 0, policy_version 4290 (0.0006)
[2025-03-28 18:19:13,181][2761575] Updated weights for policy 0, policy_version 4300 (0.0006)
[2025-03-28 18:19:14,513][2761575] Updated weights for policy 0, policy_version 4310 (0.0006)
[2025-03-28 18:19:15,749][2713170] Fps is (10 sec: 24166.4, 60 sec: 26146.1, 300 sec: 26950.3). Total num frames: 17686528. Throughput: 0: 6405.9. Samples: 4422798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:19:15,750][2713170] Avg episode reward: [(0, '26.808')]
[2025-03-28 18:19:15,964][2761575] Updated weights for policy 0, policy_version 4320 (0.0007)
[2025-03-28 18:19:17,450][2761575] Updated weights for policy 0, policy_version 4330 (0.0007)
[2025-03-28 18:19:18,958][2761575] Updated weights for policy 0, policy_version 4340 (0.0006)
[2025-03-28 18:19:20,313][2761575] Updated weights for policy 0, policy_version 4350 (0.0006)
[2025-03-28 18:19:20,749][2713170] Fps is (10 sec: 25804.7, 60 sec: 25941.3, 300 sec: 26950.3). Total num frames: 17825792. Throughput: 0: 6404.9. Samples: 4443370. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:20,750][2713170] Avg episode reward: [(0, '28.902')]
[2025-03-28 18:19:21,858][2761575] Updated weights for policy 0, policy_version 4360 (0.0006)
[2025-03-28 18:19:23,357][2761575] Updated weights for policy 0, policy_version 4370 (0.0006)
[2025-03-28 18:19:24,842][2761575] Updated weights for policy 0, policy_version 4380 (0.0007)
[2025-03-28 18:19:25,749][2713170] Fps is (10 sec: 27443.2, 60 sec: 25873.1, 300 sec: 26991.9). Total num frames: 17960960. Throughput: 0: 6443.9. Samples: 4485166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:19:25,750][2713170] Avg episode reward: [(0, '25.973')]
[2025-03-28 18:19:26,422][2761575] Updated weights for policy 0, policy_version 4390 (0.0006)
[2025-03-28 18:19:28,062][2761575] Updated weights for policy 0, policy_version 4400 (0.0007)
[2025-03-28 18:19:29,587][2761575] Updated weights for policy 0, policy_version 4410 (0.0006)
[2025-03-28 18:19:30,749][2713170] Fps is (10 sec: 24985.6, 60 sec: 25668.3, 300 sec: 26978.1). Total num frames: 18075648. Throughput: 0: 6383.6. Samples: 4521038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:19:30,750][2713170] Avg episode reward: [(0, '26.948')]
[2025-03-28 18:19:31,585][2761575] Updated weights for policy 0, policy_version 4420 (0.0007)
[2025-03-28 18:19:33,028][2761575] Updated weights for policy 0, policy_version 4430 (0.0006)
[2025-03-28 18:19:34,489][2761575] Updated weights for policy 0, policy_version 4440 (0.0006)
[2025-03-28 18:19:35,749][2713170] Fps is (10 sec: 26214.4, 60 sec: 25873.1, 300 sec: 27005.8). Total num frames: 18223104. Throughput: 0: 6435.7. Samples: 4542104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:35,750][2713170] Avg episode reward: [(0, '28.764')]
[2025-03-28 18:19:35,869][2761575] Updated weights for policy 0, policy_version 4450 (0.0006)
[2025-03-28 18:19:37,485][2761575] Updated weights for policy 0, policy_version 4460 (0.0006)
[2025-03-28 18:19:38,969][2761575] Updated weights for policy 0, policy_version 4470 (0.0006)
[2025-03-28 18:19:40,601][2761575] Updated weights for policy 0, policy_version 4480 (0.0007)
[2025-03-28 18:19:40,749][2713170] Fps is (10 sec: 27443.3, 60 sec: 25804.8, 300 sec: 26978.1). Total num frames: 18350080. Throughput: 0: 6492.3. Samples: 4582742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:40,750][2713170] Avg episode reward: [(0, '30.547')]
[2025-03-28 18:19:42,068][2761575] Updated weights for policy 0, policy_version 4490 (0.0007)
[2025-03-28 18:19:43,555][2761575] Updated weights for policy 0, policy_version 4500 (0.0007)
[2025-03-28 18:19:44,986][2761575] Updated weights for policy 0, policy_version 4510 (0.0006)
[2025-03-28 18:19:45,749][2713170] Fps is (10 sec: 26623.8, 60 sec: 26009.6, 300 sec: 26991.9). Total num frames: 18489344. Throughput: 0: 6582.0. Samples: 4623892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:45,751][2713170] Avg episode reward: [(0, '26.035')]
[2025-03-28 18:19:46,500][2761575] Updated weights for policy 0, policy_version 4520 (0.0006)
[2025-03-28 18:19:47,963][2761575] Updated weights for policy 0, policy_version 4530 (0.0006)
[2025-03-28 18:19:49,401][2761575] Updated weights for policy 0, policy_version 4540 (0.0006)
[2025-03-28 18:19:50,749][2713170] Fps is (10 sec: 28262.1, 60 sec: 26282.7, 300 sec: 27005.8). Total num frames: 18632704. Throughput: 0: 6577.1. Samples: 4644824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-28 18:19:50,751][2713170] Avg episode reward: [(0, '25.006')]
[2025-03-28 18:19:50,881][2761575] Updated weights for policy 0, policy_version 4550 (0.0007)
[2025-03-28 18:19:52,418][2761575] Updated weights for policy 0, policy_version 4560 (0.0007)
[2025-03-28 18:19:54,046][2761575] Updated weights for policy 0, policy_version 4570 (0.0006)
[2025-03-28 18:19:55,562][2761575] Updated weights for policy 0, policy_version 4580 (0.0007)
[2025-03-28 18:19:55,749][2713170] Fps is (10 sec: 27443.3, 60 sec: 26350.9, 300 sec: 27061.4). Total num frames: 18763776. Throughput: 0: 6574.1. Samples: 4684920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:19:55,750][2713170] Avg episode reward: [(0, '26.334')]
[2025-03-28 18:19:55,757][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004581_18763776.pth...
[2025-03-28 18:19:55,859][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002999_12283904.pth
[2025-03-28 18:19:56,990][2761575] Updated weights for policy 0, policy_version 4590 (0.0007)
[2025-03-28 18:19:58,372][2761575] Updated weights for policy 0, policy_version 4600 (0.0006)
[2025-03-28 18:19:59,988][2761575] Updated weights for policy 0, policy_version 4610 (0.0007)
[2025-03-28 18:20:00,749][2713170] Fps is (10 sec: 24985.7, 60 sec: 26214.4, 300 sec: 27047.5). Total num frames: 18882560. Throughput: 0: 6301.3. Samples: 4706356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:20:00,751][2713170] Avg episode reward: [(0, '27.610')]
[2025-03-28 18:20:02,122][2761575] Updated weights for policy 0, policy_version 4620 (0.0007)
[2025-03-28 18:20:03,624][2761575] Updated weights for policy 0, policy_version 4630 (0.0007)
[2025-03-28 18:20:05,258][2761575] Updated weights for policy 0, policy_version 4640 (0.0007)
[2025-03-28 18:20:05,749][2713170] Fps is (10 sec: 25395.0, 60 sec: 26214.4, 300 sec: 27019.7). Total num frames: 19017728. Throughput: 0: 6645.8. Samples: 4742430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:20:05,751][2713170] Avg episode reward: [(0, '30.438')]
[2025-03-28 18:20:06,716][2761575] Updated weights for policy 0, policy_version 4650 (0.0007)
[2025-03-28 18:20:08,237][2761575] Updated weights for policy 0, policy_version 4660 (0.0007)
[2025-03-28 18:20:09,843][2761575] Updated weights for policy 0, policy_version 4670 (0.0007)
[2025-03-28 18:20:10,749][2713170] Fps is (10 sec: 26624.1, 60 sec: 26350.9, 300 sec: 26978.1). Total num frames: 19148800. Throughput: 0: 6598.0. Samples: 4782074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:20:10,751][2713170] Avg episode reward: [(0, '26.418')]
[2025-03-28 18:20:11,499][2761575] Updated weights for policy 0, policy_version 4680 (0.0007)
[2025-03-28 18:20:13,157][2761575] Updated weights for policy 0, policy_version 4690 (0.0006)
[2025-03-28 18:20:14,452][2761575] Updated weights for policy 0, policy_version 4700 (0.0006)
[2025-03-28 18:20:15,749][2713170] Fps is (10 sec: 26624.3, 60 sec: 26624.0, 300 sec: 26964.2). Total num frames: 19283968. Throughput: 0: 6695.6. Samples: 4822340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:20:15,751][2713170] Avg episode reward: [(0, '26.633')]
[2025-03-28 18:20:15,973][2761575] Updated weights for policy 0, policy_version 4710 (0.0007)
[2025-03-28 18:20:17,517][2761575] Updated weights for policy 0, policy_version 4720 (0.0007)
[2025-03-28 18:20:19,069][2761575] Updated weights for policy 0, policy_version 4730 (0.0006)
[2025-03-28 18:20:20,611][2761575] Updated weights for policy 0, policy_version 4740 (0.0006)
[2025-03-28 18:20:20,749][2713170] Fps is (10 sec: 26624.0, 60 sec: 26487.5, 300 sec: 26936.4). Total num frames: 19415040. Throughput: 0: 6666.3. Samples: 4842088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:20:20,750][2713170] Avg episode reward: [(0, '26.869')]
[2025-03-28 18:20:22,165][2761575] Updated weights for policy 0, policy_version 4750 (0.0006)
[2025-03-28 18:20:23,673][2761575] Updated weights for policy 0, policy_version 4760 (0.0006)
[2025-03-28 18:20:25,116][2761575] Updated weights for policy 0, policy_version 4770 (0.0006)
[2025-03-28 18:20:25,749][2713170] Fps is (10 sec: 27033.6, 60 sec: 26555.7, 300 sec: 26936.4). Total num frames: 19554304. Throughput: 0: 6655.9. Samples: 4882256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-28 18:20:25,751][2713170] Avg episode reward: [(0, '30.615')]
[2025-03-28 18:20:26,629][2761575] Updated weights for policy 0, policy_version 4780 (0.0006)
[2025-03-28 18:20:28,286][2761575] Updated weights for policy 0, policy_version 4790 (0.0006)
[2025-03-28 18:20:29,932][2761575] Updated weights for policy 0, policy_version 4800 (0.0006)
[2025-03-28 18:20:30,749][2713170] Fps is (10 sec: 24985.8, 60 sec: 26487.5, 300 sec: 26867.0). Total num frames: 19664896. Throughput: 0: 6188.8. Samples: 4902386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2025-03-28 18:20:30,750][2713170] Avg episode reward: [(0, '28.399')]
[2025-03-28 18:20:32,026][2761575] Updated weights for policy 0, policy_version 4810 (0.0006)
[2025-03-28 18:20:33,578][2761575] Updated weights for policy 0, policy_version 4820 (0.0006)
[2025-03-28 18:20:35,092][2761575] Updated weights for policy 0, policy_version 4830 (0.0006)
[2025-03-28 18:20:35,749][2713170] Fps is (10 sec: 24576.0, 60 sec: 26282.7, 300 sec: 26853.1). Total num frames: 19800064. Throughput: 0: 6502.9. Samples: 4937454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-28 18:20:35,750][2713170] Avg episode reward: [(0, '27.893')]
[2025-03-28 18:20:36,608][2761575] Updated weights for policy 0, policy_version 4840 (0.0006)
[2025-03-28 18:20:38,203][2761575] Updated weights for policy 0, policy_version 4850 (0.0006)
[2025-03-28 18:20:39,719][2761575] Updated weights for policy 0, policy_version 4860 (0.0006)
[2025-03-28 18:20:40,749][2713170] Fps is (10 sec: 26623.1, 60 sec: 26350.8, 300 sec: 26839.2). Total num frames: 19931136. Throughput: 0: 6502.6. Samples: 4977538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-28 18:20:40,751][2713170] Avg episode reward: [(0, '27.686')]
[2025-03-28 18:20:41,244][2761575] Updated weights for policy 0, policy_version 4870 (0.0007)
[2025-03-28 18:20:42,789][2761575] Updated weights for policy 0, policy_version 4880 (0.0006)
[2025-03-28 18:20:43,401][2761553] Stopping Batcher_0...
[2025-03-28 18:20:43,402][2761553] Loop batcher_evt_loop terminating...
[2025-03-28 18:20:43,401][2713170] Component Batcher_0 stopped!
[2025-03-28 18:20:43,403][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
[2025-03-28 18:20:43,439][2761575] Weights refcount: 2 0
[2025-03-28 18:20:43,479][2761553] Removing /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003815_15626240.pth
[2025-03-28 18:20:43,482][2761553] Saving /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
[2025-03-28 18:20:43,488][2761574] Stopping RolloutWorker_w0...
[2025-03-28 18:20:43,489][2761574] Loop rollout_proc0_evt_loop terminating...
[2025-03-28 18:20:43,488][2713170] Component RolloutWorker_w0 stopped!
[2025-03-28 18:20:43,496][2713170] Component RolloutWorker_w7 stopped!
[2025-03-28 18:20:43,496][2761582] Stopping RolloutWorker_w7...
[2025-03-28 18:20:43,497][2761582] Loop rollout_proc7_evt_loop terminating...
[2025-03-28 18:20:43,501][2761576] Stopping RolloutWorker_w1...
[2025-03-28 18:20:43,501][2713170] Component RolloutWorker_w1 stopped!
[2025-03-28 18:20:43,502][2761576] Loop rollout_proc1_evt_loop terminating...
[2025-03-28 18:20:43,502][2713170] Component RolloutWorker_w4 stopped!
[2025-03-28 18:20:43,502][2761578] Stopping RolloutWorker_w4...
[2025-03-28 18:20:43,503][2713170] Component RolloutWorker_w2 stopped!
[2025-03-28 18:20:43,503][2761578] Loop rollout_proc4_evt_loop terminating...
[2025-03-28 18:20:43,503][2761577] Stopping RolloutWorker_w2...
[2025-03-28 18:20:43,504][2761577] Loop rollout_proc2_evt_loop terminating...
[2025-03-28 18:20:43,506][2713170] Component RolloutWorker_w6 stopped!
[2025-03-28 18:20:43,506][2713170] Component RolloutWorker_w3 stopped!
[2025-03-28 18:20:43,506][2761581] Stopping RolloutWorker_w6...
[2025-03-28 18:20:43,507][2761581] Loop rollout_proc6_evt_loop terminating...
[2025-03-28 18:20:43,507][2761580] Stopping RolloutWorker_w3...
[2025-03-28 18:20:43,508][2761580] Loop rollout_proc3_evt_loop terminating...
[2025-03-28 18:20:43,509][2713170] Component RolloutWorker_w5 stopped!
[2025-03-28 18:20:43,509][2761579] Stopping RolloutWorker_w5...
[2025-03-28 18:20:43,510][2761579] Loop rollout_proc5_evt_loop terminating...
[2025-03-28 18:20:43,567][2761553] Stopping LearnerWorker_p0...
[2025-03-28 18:20:43,568][2761553] Loop learner_proc0_evt_loop terminating...
[2025-03-28 18:20:43,567][2713170] Component LearnerWorker_p0 stopped!
[2025-03-28 18:20:44,673][2761575] Stopping InferenceWorker_p0-w0...
[2025-03-28 18:20:44,674][2761575] Loop inference_proc0-0_evt_loop terminating...
[2025-03-28 18:20:44,673][2713170] Component InferenceWorker_p0-w0 stopped!
[2025-03-28 18:20:44,675][2713170] Waiting for process learner_proc0 to stop...
[2025-03-28 18:20:44,792][2713170] Waiting for process inference_proc0-0 to join...
[2025-03-28 18:20:45,238][2713170] Waiting for process rollout_proc0 to join...
[2025-03-28 18:20:45,239][2713170] Waiting for process rollout_proc1 to join...
[2025-03-28 18:20:45,240][2713170] Waiting for process rollout_proc2 to join...
[2025-03-28 18:20:45,241][2713170] Waiting for process rollout_proc3 to join...
[2025-03-28 18:20:45,242][2713170] Waiting for process rollout_proc4 to join...
[2025-03-28 18:20:45,243][2713170] Waiting for process rollout_proc5 to join...
[2025-03-28 18:20:45,244][2713170] Waiting for process rollout_proc6 to join...
[2025-03-28 18:20:45,245][2713170] Waiting for process rollout_proc7 to join...
[2025-03-28 18:20:45,246][2713170] Batcher 0 profile tree view:
batching: 77.5075, releasing_batches: 0.1343
[2025-03-28 18:20:45,247][2713170] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
  wait_policy_total: 23.1149
update_model: 9.8765
  weight_update: 0.0006
one_step: 0.0013
  handle_policy_step: 682.7108
    deserialize: 52.1600, stack: 3.4602, obs_to_device_normalize: 153.6694, forward: 309.7647, send_messages: 38.3735
    prepare_outputs: 94.5573
      to_cpu: 56.9584
[2025-03-28 18:20:45,247][2713170] Learner 0 profile tree view:
misc: 0.0224, prepare_batch: 28.5087
train: 78.0478
  epoch_init: 0.0258, minibatch_init: 0.0250, losses_postprocess: 1.4478, kl_divergence: 1.7219, after_optimizer: 9.8786
  calculate_losses: 33.1913
    losses_init: 0.0203, forward_head: 3.4984, bptt_initial: 16.6980, tail: 2.4455, advantages_returns: 0.6670, losses: 4.6291
    bptt: 4.4250
      bptt_forward_core: 4.1954
  update: 30.0998
    clip: 3.4030
[2025-03-28 18:20:45,248][2713170] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.6317, enqueue_policy_requests: 31.6726, env_step: 475.5426, overhead: 37.4830, complete_rollouts: 0.9234
save_policy_outputs: 36.5034
  split_output_tensors: 17.4458
[2025-03-28 18:20:45,249][2713170] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.6550, enqueue_policy_requests: 38.5978, env_step: 468.0902, overhead: 38.3842, complete_rollouts: 0.9600
save_policy_outputs: 37.3333
  split_output_tensors: 17.8236
[2025-03-28 18:20:45,249][2713170] Loop Runner_EvtLoop terminating...
[2025-03-28 18:20:45,250][2713170] Runner profile tree view:
main_loop: 764.5789
[2025-03-28 18:20:45,251][2713170] Collected {0: 20004864}, FPS: 26164.5
[2025-03-28 18:21:04,142][2713170] Loading existing experiment configuration from /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
[2025-03-28 18:21:04,143][2713170] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-28 18:21:04,144][2713170] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-28 18:21:04,145][2713170] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-28 18:21:04,145][2713170] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-28 18:21:04,146][2713170] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-28 18:21:04,148][2713170] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-03-28 18:21:04,148][2713170] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-28 18:21:04,149][2713170] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-03-28 18:21:04,150][2713170] Adding new argument 'hf_repository'='stalaei/DeepRL_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-03-28 18:21:04,151][2713170] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-28 18:21:04,152][2713170] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-28 18:21:04,153][2713170] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-28 18:21:04,153][2713170] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-28 18:21:04,154][2713170] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-28 18:21:04,192][2713170] RunningMeanStd input shape: (3, 72, 128)
[2025-03-28 18:21:04,194][2713170] RunningMeanStd input shape: (1,)
[2025-03-28 18:21:04,209][2713170] ConvEncoder: input_channels=3
[2025-03-28 18:21:04,240][2713170] Conv encoder output size: 512
[2025-03-28 18:21:04,241][2713170] Policy head output size: 512
[2025-03-28 18:21:04,908][2713170] Num frames 100...
[2025-03-28 18:21:05,006][2713170] Num frames 200...
[2025-03-28 18:21:05,103][2713170] Num frames 300...
[2025-03-28 18:21:05,203][2713170] Num frames 400...
[2025-03-28 18:21:05,335][2713170] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800
[2025-03-28 18:21:05,336][2713170] Avg episode reward: 6.800, avg true_objective: 4.800
[2025-03-28 18:21:05,357][2713170] Num frames 500...
[2025-03-28 18:21:05,455][2713170] Num frames 600...
[2025-03-28 18:21:05,553][2713170] Num frames 700...
[2025-03-28 18:21:05,652][2713170] Num frames 800...
[2025-03-28 18:21:05,752][2713170] Num frames 900...
[2025-03-28 18:21:05,851][2713170] Num frames 1000...
[2025-03-28 18:21:05,951][2713170] Num frames 1100...
[2025-03-28 18:21:06,053][2713170] Num frames 1200...
[2025-03-28 18:21:06,160][2713170] Num frames 1300...
[2025-03-28 18:21:06,271][2713170] Num frames 1400...
[2025-03-28 18:21:06,381][2713170] Num frames 1500...
[2025-03-28 18:21:06,493][2713170] Num frames 1600...
[2025-03-28 18:21:06,602][2713170] Num frames 1700...
[2025-03-28 18:21:06,709][2713170] Num frames 1800...
[2025-03-28 18:21:06,820][2713170] Num frames 1900...
[2025-03-28 18:21:06,930][2713170] Num frames 2000...
[2025-03-28 18:21:07,031][2713170] Num frames 2100...
[2025-03-28 18:21:07,118][2713170] Avg episode rewards: #0: 24.600, true rewards: #0: 10.600
[2025-03-28 18:21:07,119][2713170] Avg episode reward: 24.600, avg true_objective: 10.600
[2025-03-28 18:21:07,221][2713170] Num frames 2200...
[2025-03-28 18:21:07,305][2713170] Num frames 2300...
[2025-03-28 18:21:07,387][2713170] Num frames 2400...
[2025-03-28 18:21:07,474][2713170] Num frames 2500...
[2025-03-28 18:21:07,559][2713170] Num frames 2600...
[2025-03-28 18:21:07,642][2713170] Num frames 2700...
[2025-03-28 18:21:07,726][2713170] Num frames 2800...
[2025-03-28 18:21:07,811][2713170] Num frames 2900...
[2025-03-28 18:21:07,895][2713170] Num frames 3000...
[2025-03-28 18:21:07,980][2713170] Num frames 3100...
[2025-03-28 18:21:08,066][2713170] Num frames 3200...
[2025-03-28 18:21:08,152][2713170] Num frames 3300...
[2025-03-28 18:21:08,237][2713170] Num frames 3400...
[2025-03-28 18:21:08,325][2713170] Num frames 3500...
[2025-03-28 18:21:08,412][2713170] Num frames 3600...
[2025-03-28 18:21:08,498][2713170] Num frames 3700...
[2025-03-28 18:21:08,583][2713170] Num frames 3800...
[2025-03-28 18:21:08,671][2713170] Num frames 3900...
[2025-03-28 18:21:08,740][2713170] Avg episode rewards: #0: 32.706, true rewards: #0: 13.040
[2025-03-28 18:21:08,741][2713170] Avg episode reward: 32.706, avg true_objective: 13.040
[2025-03-28 18:21:08,835][2713170] Num frames 4000...
[2025-03-28 18:21:08,918][2713170] Num frames 4100...
[2025-03-28 18:21:09,000][2713170] Num frames 4200...
[2025-03-28 18:21:09,080][2713170] Num frames 4300...
[2025-03-28 18:21:09,161][2713170] Num frames 4400...
[2025-03-28 18:21:09,240][2713170] Num frames 4500...
[2025-03-28 18:21:09,321][2713170] Num frames 4600...
[2025-03-28 18:21:09,400][2713170] Num frames 4700...
[2025-03-28 18:21:09,484][2713170] Num frames 4800...
[2025-03-28 18:21:09,567][2713170] Num frames 4900...
[2025-03-28 18:21:09,652][2713170] Num frames 5000...
[2025-03-28 18:21:09,739][2713170] Num frames 5100...
[2025-03-28 18:21:09,828][2713170] Num frames 5200...
[2025-03-28 18:21:09,919][2713170] Num frames 5300...
[2025-03-28 18:21:10,008][2713170] Num frames 5400...
[2025-03-28 18:21:10,092][2713170] Num frames 5500...
[2025-03-28 18:21:10,177][2713170] Num frames 5600...
[2025-03-28 18:21:10,262][2713170] Num frames 5700...
[2025-03-28 18:21:10,348][2713170] Num frames 5800...
[2025-03-28 18:21:10,432][2713170] Num frames 5900...
[2025-03-28 18:21:10,518][2713170] Num frames 6000...
[2025-03-28 18:21:10,584][2713170] Avg episode rewards: #0: 38.029, true rewards: #0: 15.030
[2025-03-28 18:21:10,585][2713170] Avg episode reward: 38.029, avg true_objective: 15.030
[2025-03-28 18:21:10,660][2713170] Num frames 6100...
[2025-03-28 18:21:10,747][2713170] Num frames 6200...
[2025-03-28 18:21:10,831][2713170] Num frames 6300...
[2025-03-28 18:21:10,919][2713170] Avg episode rewards: #0: 31.877, true rewards: #0: 12.678
[2025-03-28 18:21:10,920][2713170] Avg episode reward: 31.877, avg true_objective: 12.678
[2025-03-28 18:21:10,975][2713170] Num frames 6400...
[2025-03-28 18:21:11,059][2713170] Num frames 6500...
[2025-03-28 18:21:11,146][2713170] Num frames 6600...
[2025-03-28 18:21:11,235][2713170] Num frames 6700...
[2025-03-28 18:21:11,321][2713170] Num frames 6800...
[2025-03-28 18:21:11,406][2713170] Num frames 6900...
[2025-03-28 18:21:11,491][2713170] Num frames 7000...
[2025-03-28 18:21:11,579][2713170] Num frames 7100...
[2025-03-28 18:21:11,663][2713170] Num frames 7200...
[2025-03-28 18:21:11,750][2713170] Num frames 7300...
[2025-03-28 18:21:11,832][2713170] Num frames 7400...
[2025-03-28 18:21:11,914][2713170] Num frames 7500...
[2025-03-28 18:21:11,999][2713170] Num frames 7600...
[2025-03-28 18:21:12,098][2713170] Avg episode rewards: #0: 31.918, true rewards: #0: 12.752
[2025-03-28 18:21:12,099][2713170] Avg episode reward: 31.918, avg true_objective: 12.752
[2025-03-28 18:21:12,143][2713170] Num frames 7700...
[2025-03-28 18:21:12,231][2713170] Num frames 7800...
[2025-03-28 18:21:12,315][2713170] Num frames 7900...
[2025-03-28 18:21:12,402][2713170] Num frames 8000...
[2025-03-28 18:21:12,488][2713170] Num frames 8100...
[2025-03-28 18:21:12,580][2713170] Num frames 8200...
[2025-03-28 18:21:12,664][2713170] Num frames 8300...
[2025-03-28 18:21:12,746][2713170] Num frames 8400...
[2025-03-28 18:21:12,829][2713170] Num frames 8500...
[2025-03-28 18:21:12,950][2713170] Avg episode rewards: #0: 29.967, true rewards: #0: 12.253
[2025-03-28 18:21:12,951][2713170] Avg episode reward: 29.967, avg true_objective: 12.253
[2025-03-28 18:21:12,982][2713170] Num frames 8600...
[2025-03-28 18:21:13,075][2713170] Num frames 8700...
[2025-03-28 18:21:13,160][2713170] Num frames 8800...
[2025-03-28 18:21:13,245][2713170] Num frames 8900...
[2025-03-28 18:21:13,335][2713170] Num frames 9000...
[2025-03-28 18:21:13,421][2713170] Num frames 9100...
[2025-03-28 18:21:13,509][2713170] Num frames 9200...
[2025-03-28 18:21:13,598][2713170] Num frames 9300...
[2025-03-28 18:21:13,686][2713170] Num frames 9400...
[2025-03-28 18:21:13,778][2713170] Num frames 9500...
[2025-03-28 18:21:13,867][2713170] Num frames 9600...
[2025-03-28 18:21:13,955][2713170] Num frames 9700...
[2025-03-28 18:21:14,044][2713170] Num frames 9800...
[2025-03-28 18:21:14,132][2713170] Num frames 9900...
[2025-03-28 18:21:14,222][2713170] Num frames 10000...
[2025-03-28 18:21:14,309][2713170] Num frames 10100...
[2025-03-28 18:21:14,397][2713170] Num frames 10200...
[2025-03-28 18:21:14,485][2713170] Num frames 10300...
[2025-03-28 18:21:14,575][2713170] Num frames 10400...
[2025-03-28 18:21:14,665][2713170] Num frames 10500...
[2025-03-28 18:21:14,753][2713170] Num frames 10600...
[2025-03-28 18:21:14,879][2713170] Avg episode rewards: #0: 33.721, true rewards: #0: 13.346
[2025-03-28 18:21:14,880][2713170] Avg episode reward: 33.721, avg true_objective: 13.346
[2025-03-28 18:21:14,907][2713170] Num frames 10700...
[2025-03-28 18:21:15,001][2713170] Num frames 10800...
[2025-03-28 18:21:15,091][2713170] Num frames 10900...
[2025-03-28 18:21:15,181][2713170] Num frames 11000...
[2025-03-28 18:21:15,270][2713170] Num frames 11100...
[2025-03-28 18:21:15,357][2713170] Num frames 11200...
[2025-03-28 18:21:15,446][2713170] Num frames 11300...
[2025-03-28 18:21:15,537][2713170] Num frames 11400...
[2025-03-28 18:21:15,626][2713170] Num frames 11500...
[2025-03-28 18:21:15,715][2713170] Num frames 11600...
[2025-03-28 18:21:15,804][2713170] Num frames 11700...
[2025-03-28 18:21:15,895][2713170] Num frames 11800...
[2025-03-28 18:21:15,979][2713170] Avg episode rewards: #0: 32.810, true rewards: #0: 13.143
[2025-03-28 18:21:15,980][2713170] Avg episode reward: 32.810, avg true_objective: 13.143
[2025-03-28 18:21:16,062][2713170] Num frames 11900...
[2025-03-28 18:21:16,153][2713170] Num frames 12000...
[2025-03-28 18:21:16,243][2713170] Num frames 12100...
[2025-03-28 18:21:16,332][2713170] Num frames 12200...
[2025-03-28 18:21:16,422][2713170] Num frames 12300...
[2025-03-28 18:21:16,511][2713170] Num frames 12400...
[2025-03-28 18:21:16,600][2713170] Num frames 12500...
[2025-03-28 18:21:16,688][2713170] Num frames 12600...
[2025-03-28 18:21:16,774][2713170] Num frames 12700...
[2025-03-28 18:21:16,859][2713170] Num frames 12800...
[2025-03-28 18:21:16,944][2713170] Num frames 12900...
[2025-03-28 18:21:17,018][2713170] Avg episode rewards: #0: 31.721, true rewards: #0: 12.921
[2025-03-28 18:21:17,019][2713170] Avg episode reward: 31.721, avg true_objective: 12.921
[2025-03-28 18:21:22,578][2713170] Replay video saved to /home/stalaei/projects/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!