Upload folder using huggingface_hub

e0ed6e6 verified 2 months ago

90 kB

	[2025-03-20 23:52:38,528][00031] Saving configuration to /kaggle/working/train_dir/default_experiment/config.json...
	[2025-03-20 23:52:38,530][00031] Rollout worker 0 uses device cpu
	[2025-03-20 23:52:38,531][00031] Rollout worker 1 uses device cpu
	[2025-03-20 23:52:38,532][00031] Rollout worker 2 uses device cpu
	[2025-03-20 23:52:38,533][00031] Rollout worker 3 uses device cpu
	[2025-03-20 23:52:38,533][00031] Rollout worker 4 uses device cpu
	[2025-03-20 23:52:38,535][00031] Rollout worker 5 uses device cpu
	[2025-03-20 23:52:38,536][00031] Rollout worker 6 uses device cpu
	[2025-03-20 23:52:38,536][00031] Rollout worker 7 uses device cpu
	[2025-03-20 23:52:38,665][00031] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-20 23:52:38,666][00031] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-20 23:52:38,711][00031] Starting all processes...
	[2025-03-20 23:52:38,712][00031] Starting process learner_proc0
	[2025-03-20 23:52:38,804][00031] Starting all processes...
	[2025-03-20 23:52:38,811][00031] Starting process inference_proc0-0
	[2025-03-20 23:52:38,812][00031] Starting process rollout_proc0
	[2025-03-20 23:52:38,813][00031] Starting process rollout_proc1
	[2025-03-20 23:52:38,813][00031] Starting process rollout_proc2
	[2025-03-20 23:52:38,813][00031] Starting process rollout_proc3
	[2025-03-20 23:52:38,814][00031] Starting process rollout_proc4
	[2025-03-20 23:52:38,815][00031] Starting process rollout_proc5
	[2025-03-20 23:52:38,815][00031] Starting process rollout_proc6
	[2025-03-20 23:52:38,818][00031] Starting process rollout_proc7
	[2025-03-20 23:52:46,134][00213] Worker 2 uses CPU cores [2]
	[2025-03-20 23:52:46,654][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-20 23:52:46,657][00196] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-20 23:52:46,704][00196] Num visible devices: 1
	[2025-03-20 23:52:46,717][00196] Starting seed is not provided
	[2025-03-20 23:52:46,718][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-20 23:52:46,718][00196] Initializing actor-critic model on device cuda:0
	[2025-03-20 23:52:46,719][00196] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-20 23:52:46,725][00196] RunningMeanStd input shape: (1,)
	[2025-03-20 23:52:46,793][00196] ConvEncoder: input_channels=3
	[2025-03-20 23:52:46,980][00210] Worker 0 uses CPU cores [0]
	[2025-03-20 23:52:47,234][00217] Worker 7 uses CPU cores [3]
	[2025-03-20 23:52:47,376][00212] Worker 1 uses CPU cores [1]
	[2025-03-20 23:52:47,394][00196] Conv encoder output size: 512
	[2025-03-20 23:52:47,394][00196] Policy head output size: 512
	[2025-03-20 23:52:47,488][00196] Created Actor Critic model with architecture:
	[2025-03-20 23:52:47,488][00196] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-20 23:52:47,496][00215] Worker 5 uses CPU cores [1]
	[2025-03-20 23:52:47,501][00211] Worker 3 uses CPU cores [3]
	[2025-03-20 23:52:47,539][00209] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-20 23:52:47,539][00209] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-03-20 23:52:47,568][00216] Worker 6 uses CPU cores [2]
	[2025-03-20 23:52:47,571][00209] Num visible devices: 1
	[2025-03-20 23:52:47,592][00214] Worker 4 uses CPU cores [0]
	[2025-03-20 23:52:47,798][00196] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-20 23:52:49,651][00196] No checkpoints found
	[2025-03-20 23:52:49,651][00196] Did not load from checkpoint, starting from scratch!
	[2025-03-20 23:52:49,651][00196] Initialized policy 0 weights for model version 0
	[2025-03-20 23:52:49,655][00196] LearnerWorker_p0 finished initialization!
	[2025-03-20 23:52:49,656][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-20 23:52:49,751][00209] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-20 23:52:49,752][00209] RunningMeanStd input shape: (1,)
	[2025-03-20 23:52:49,765][00209] ConvEncoder: input_channels=3
	[2025-03-20 23:52:49,879][00209] Conv encoder output size: 512
	[2025-03-20 23:52:49,880][00209] Policy head output size: 512
	[2025-03-20 23:52:49,956][00031] Inference worker 0-0 is ready!
	[2025-03-20 23:52:49,956][00031] All inference workers are ready! Signal rollout workers to start!
	[2025-03-20 23:52:50,070][00213] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,078][00212] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,078][00211] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,076][00215] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,080][00214] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,080][00216] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,081][00217] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,086][00210] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-20 23:52:50,671][00210] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:50,671][00217] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:50,769][00212] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:51,014][00213] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:51,023][00216] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:51,102][00217] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:51,202][00212] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:51,406][00210] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:51,559][00217] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:51,818][00214] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:51,820][00213] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:51,934][00217] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:51,947][00216] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:52,278][00212] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:52,420][00211] Decorrelating experience for 0 frames...
	[2025-03-20 23:52:52,632][00213] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:52,724][00214] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:52,868][00211] Decorrelating experience for 32 frames...
	[2025-03-20 23:52:52,903][00212] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:53,299][00214] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:53,344][00216] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:53,405][00211] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:53,545][00213] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:53,890][00031] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-20 23:52:53,892][00031] Avg episode reward: [(0, '1.280')]
	[2025-03-20 23:52:53,999][00210] Decorrelating experience for 64 frames...
	[2025-03-20 23:52:54,225][00211] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:54,372][00214] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:54,852][00216] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:55,269][00210] Decorrelating experience for 96 frames...
	[2025-03-20 23:52:55,749][00196] Signal inference workers to stop experience collection...
	[2025-03-20 23:52:55,759][00209] InferenceWorker_p0-w0: stopping experience collection
	[2025-03-20 23:52:57,987][00196] Signal inference workers to resume experience collection...
	[2025-03-20 23:52:57,988][00209] InferenceWorker_p0-w0: resuming experience collection
	[2025-03-20 23:52:58,654][00031] Heartbeat connected on Batcher_0
	[2025-03-20 23:52:58,658][00031] Heartbeat connected on LearnerWorker_p0
	[2025-03-20 23:52:58,671][00031] Heartbeat connected on InferenceWorker_p0-w0
	[2025-03-20 23:52:58,684][00031] Heartbeat connected on RolloutWorker_w0
	[2025-03-20 23:52:58,693][00031] Heartbeat connected on RolloutWorker_w2
	[2025-03-20 23:52:58,706][00031] Heartbeat connected on RolloutWorker_w4
	[2025-03-20 23:52:58,711][00031] Heartbeat connected on RolloutWorker_w6
	[2025-03-20 23:52:58,726][00031] Heartbeat connected on RolloutWorker_w1
	[2025-03-20 23:52:58,727][00031] Heartbeat connected on RolloutWorker_w3
	[2025-03-20 23:52:58,731][00031] Heartbeat connected on RolloutWorker_w7
	[2025-03-20 23:52:58,892][00031] Fps is (10 sec: 2456.4, 60 sec: 2456.4, 300 sec: 2456.4). Total num frames: 12288. Throughput: 0: 473.8. Samples: 2382. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
	[2025-03-20 23:52:58,895][00031] Avg episode reward: [(0, '2.841')]
	[2025-03-20 23:53:02,646][00209] Updated weights for policy 0, policy_version 10 (0.0151)
	[2025-03-20 23:53:03,890][00031] Fps is (10 sec: 4915.2, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 49152. Throughput: 0: 952.6. Samples: 9538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:53:03,894][00031] Avg episode reward: [(0, '4.181')]
	[2025-03-20 23:53:07,505][00209] Updated weights for policy 0, policy_version 20 (0.0015)
	[2025-03-20 23:53:08,890][00031] Fps is (10 sec: 8194.0, 60 sec: 6280.5, 300 sec: 6280.5). Total num frames: 94208. Throughput: 0: 1473.3. Samples: 22112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:53:08,895][00031] Avg episode reward: [(0, '4.555')]
	[2025-03-20 23:53:12,075][00209] Updated weights for policy 0, policy_version 30 (0.0019)
	[2025-03-20 23:53:13,890][00031] Fps is (10 sec: 8601.6, 60 sec: 6758.4, 300 sec: 6758.4). Total num frames: 135168. Throughput: 0: 1432.6. Samples: 28664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:53:13,891][00031] Avg episode reward: [(0, '4.422')]
	[2025-03-20 23:53:13,951][00196] Saving new best policy, reward=4.422!
	[2025-03-20 23:53:16,779][00209] Updated weights for policy 0, policy_version 40 (0.0015)
	[2025-03-20 23:53:18,890][00031] Fps is (10 sec: 8601.1, 60 sec: 7208.8, 300 sec: 7208.8). Total num frames: 180224. Throughput: 0: 1680.6. Samples: 42028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:53:18,892][00031] Avg episode reward: [(0, '4.515')]
	[2025-03-20 23:53:18,894][00196] Saving new best policy, reward=4.515!
	[2025-03-20 23:53:21,353][00209] Updated weights for policy 0, policy_version 50 (0.0015)
	[2025-03-20 23:53:23,890][00031] Fps is (10 sec: 9010.9, 60 sec: 7509.2, 300 sec: 7509.2). Total num frames: 225280. Throughput: 0: 1842.3. Samples: 55282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:53:23,892][00031] Avg episode reward: [(0, '4.373')]
	[2025-03-20 23:53:26,024][00209] Updated weights for policy 0, policy_version 60 (0.0018)
	[2025-03-20 23:53:28,890][00031] Fps is (10 sec: 8602.0, 60 sec: 7606.9, 300 sec: 7606.9). Total num frames: 266240. Throughput: 0: 1769.1. Samples: 61930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:53:28,891][00031] Avg episode reward: [(0, '4.372')]
	[2025-03-20 23:53:31,341][00209] Updated weights for policy 0, policy_version 70 (0.0017)
	[2025-03-20 23:53:33,890][00031] Fps is (10 sec: 8192.4, 60 sec: 7680.0, 300 sec: 7680.0). Total num frames: 307200. Throughput: 0: 1840.9. Samples: 73648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:53:33,891][00031] Avg episode reward: [(0, '4.470')]
	[2025-03-20 23:53:35,957][00209] Updated weights for policy 0, policy_version 80 (0.0017)
	[2025-03-20 23:53:38,890][00031] Fps is (10 sec: 8601.6, 60 sec: 7827.9, 300 sec: 7827.9). Total num frames: 352256. Throughput: 0: 1932.4. Samples: 86972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
	[2025-03-20 23:53:38,892][00031] Avg episode reward: [(0, '4.562')]
	[2025-03-20 23:53:38,893][00196] Saving new best policy, reward=4.562!
	[2025-03-20 23:53:40,696][00209] Updated weights for policy 0, policy_version 90 (0.0020)
	[2025-03-20 23:53:43,890][00031] Fps is (10 sec: 8601.5, 60 sec: 7864.3, 300 sec: 7864.3). Total num frames: 393216. Throughput: 0: 2021.0. Samples: 93324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:53:43,892][00031] Avg episode reward: [(0, '4.680')]
	[2025-03-20 23:53:43,956][00196] Saving new best policy, reward=4.680!
	[2025-03-20 23:53:45,357][00209] Updated weights for policy 0, policy_version 100 (0.0016)
	[2025-03-20 23:53:48,890][00031] Fps is (10 sec: 8601.6, 60 sec: 7968.6, 300 sec: 7968.6). Total num frames: 438272. Throughput: 0: 2155.6. Samples: 106540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-20 23:53:48,892][00031] Avg episode reward: [(0, '4.689')]
	[2025-03-20 23:53:48,894][00196] Saving new best policy, reward=4.689!
	[2025-03-20 23:53:50,095][00209] Updated weights for policy 0, policy_version 110 (0.0017)
	[2025-03-20 23:53:53,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8055.5, 300 sec: 8055.5). Total num frames: 483328. Throughput: 0: 2172.0. Samples: 119850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:53:53,892][00031] Avg episode reward: [(0, '4.561')]
	[2025-03-20 23:53:54,569][00209] Updated weights for policy 0, policy_version 120 (0.0016)
	[2025-03-20 23:53:58,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8601.9, 300 sec: 8129.0). Total num frames: 528384. Throughput: 0: 2177.5. Samples: 126652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-20 23:53:58,891][00031] Avg episode reward: [(0, '4.622')]
	[2025-03-20 23:53:59,067][00209] Updated weights for policy 0, policy_version 130 (0.0015)
	[2025-03-20 23:54:03,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8075.0). Total num frames: 565248. Throughput: 0: 2145.1. Samples: 138556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:54:03,891][00031] Avg episode reward: [(0, '4.946')]
	[2025-03-20 23:54:03,900][00196] Saving new best policy, reward=4.946!
	[2025-03-20 23:54:04,401][00209] Updated weights for policy 0, policy_version 140 (0.0014)
	[2025-03-20 23:54:08,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8137.4). Total num frames: 610304. Throughput: 0: 2142.7. Samples: 151702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-20 23:54:08,892][00031] Avg episode reward: [(0, '4.952')]
	[2025-03-20 23:54:08,895][00196] Saving new best policy, reward=4.952!
	[2025-03-20 23:54:09,162][00209] Updated weights for policy 0, policy_version 150 (0.0018)
	[2025-03-20 23:54:13,618][00209] Updated weights for policy 0, policy_version 160 (0.0017)
	[2025-03-20 23:54:13,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8669.9, 300 sec: 8192.0). Total num frames: 655360. Throughput: 0: 2140.7. Samples: 158262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:54:13,891][00031] Avg episode reward: [(0, '5.600')]
	[2025-03-20 23:54:13,896][00196] Saving new best policy, reward=5.600!
	[2025-03-20 23:54:18,170][00209] Updated weights for policy 0, policy_version 170 (0.0016)
	[2025-03-20 23:54:18,891][00031] Fps is (10 sec: 9010.4, 60 sec: 8669.8, 300 sec: 8240.1). Total num frames: 700416. Throughput: 0: 2180.9. Samples: 171792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:54:18,893][00031] Avg episode reward: [(0, '5.819')]
	[2025-03-20 23:54:18,899][00196] Saving new best policy, reward=5.819!
	[2025-03-20 23:54:22,770][00209] Updated weights for policy 0, policy_version 180 (0.0023)
	[2025-03-20 23:54:23,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8283.0). Total num frames: 745472. Throughput: 0: 2185.1. Samples: 185302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:54:23,892][00031] Avg episode reward: [(0, '5.945')]
	[2025-03-20 23:54:23,899][00196] Saving new best policy, reward=5.945!
	[2025-03-20 23:54:27,279][00209] Updated weights for policy 0, policy_version 190 (0.0016)
	[2025-03-20 23:54:28,890][00031] Fps is (10 sec: 9012.1, 60 sec: 8738.1, 300 sec: 8321.3). Total num frames: 790528. Throughput: 0: 2194.8. Samples: 192088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-20 23:54:28,892][00031] Avg episode reward: [(0, '5.375')]
	[2025-03-20 23:54:31,971][00209] Updated weights for policy 0, policy_version 200 (0.0015)
	[2025-03-20 23:54:33,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8314.9). Total num frames: 831488. Throughput: 0: 2198.5. Samples: 205472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:54:33,893][00031] Avg episode reward: [(0, '6.221')]
	[2025-03-20 23:54:33,901][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth...
	[2025-03-20 23:54:33,985][00196] Saving new best policy, reward=6.221!
	[2025-03-20 23:54:37,191][00209] Updated weights for policy 0, policy_version 210 (0.0017)
	[2025-03-20 23:54:38,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8309.0). Total num frames: 872448. Throughput: 0: 2168.2. Samples: 217418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:54:38,891][00031] Avg episode reward: [(0, '6.641')]
	[2025-03-20 23:54:38,893][00196] Saving new best policy, reward=6.641!
	[2025-03-20 23:54:41,755][00209] Updated weights for policy 0, policy_version 220 (0.0016)
	[2025-03-20 23:54:43,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8340.9). Total num frames: 917504. Throughput: 0: 2163.4. Samples: 224004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:54:43,891][00031] Avg episode reward: [(0, '5.867')]
	[2025-03-20 23:54:46,276][00209] Updated weights for policy 0, policy_version 230 (0.0016)
	[2025-03-20 23:54:48,890][00031] Fps is (10 sec: 9011.0, 60 sec: 8738.1, 300 sec: 8370.1). Total num frames: 962560. Throughput: 0: 2201.8. Samples: 237638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:54:48,892][00031] Avg episode reward: [(0, '6.875')]
	[2025-03-20 23:54:48,895][00196] Saving new best policy, reward=6.875!
	[2025-03-20 23:54:50,909][00209] Updated weights for policy 0, policy_version 240 (0.0023)
	[2025-03-20 23:54:53,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8396.8). Total num frames: 1007616. Throughput: 0: 2204.4. Samples: 250900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:54:53,891][00031] Avg episode reward: [(0, '7.446')]
	[2025-03-20 23:54:53,900][00196] Saving new best policy, reward=7.446!
	[2025-03-20 23:54:55,549][00209] Updated weights for policy 0, policy_version 250 (0.0019)
	[2025-03-20 23:54:58,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8421.4). Total num frames: 1052672. Throughput: 0: 2206.6. Samples: 257560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:54:58,892][00031] Avg episode reward: [(0, '8.269')]
	[2025-03-20 23:54:58,893][00196] Saving new best policy, reward=8.269!
	[2025-03-20 23:55:00,292][00209] Updated weights for policy 0, policy_version 260 (0.0017)
	[2025-03-20 23:55:03,890][00031] Fps is (10 sec: 8601.0, 60 sec: 8806.3, 300 sec: 8412.5). Total num frames: 1093632. Throughput: 0: 2197.9. Samples: 270698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:55:03,892][00031] Avg episode reward: [(0, '7.845')]
	[2025-03-20 23:55:04,872][00209] Updated weights for policy 0, policy_version 270 (0.0017)
	[2025-03-20 23:55:08,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8738.1, 300 sec: 8404.4). Total num frames: 1134592. Throughput: 0: 2157.3. Samples: 282380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:55:08,892][00031] Avg episode reward: [(0, '8.440')]
	[2025-03-20 23:55:08,895][00196] Saving new best policy, reward=8.440!
	[2025-03-20 23:55:10,203][00209] Updated weights for policy 0, policy_version 280 (0.0015)
	[2025-03-20 23:55:13,890][00031] Fps is (10 sec: 8601.9, 60 sec: 8738.1, 300 sec: 8426.0). Total num frames: 1179648. Throughput: 0: 2151.3. Samples: 288898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:55:13,892][00031] Avg episode reward: [(0, '8.213')]
	[2025-03-20 23:55:14,817][00209] Updated weights for policy 0, policy_version 290 (0.0014)
	[2025-03-20 23:55:18,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8738.3, 300 sec: 8446.2). Total num frames: 1224704. Throughput: 0: 2156.3. Samples: 302504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:55:18,891][00031] Avg episode reward: [(0, '8.699')]
	[2025-03-20 23:55:18,893][00196] Saving new best policy, reward=8.699!
	[2025-03-20 23:55:19,371][00209] Updated weights for policy 0, policy_version 300 (0.0019)
	[2025-03-20 23:55:23,890][00031] Fps is (10 sec: 8601.9, 60 sec: 8669.9, 300 sec: 8437.8). Total num frames: 1265664. Throughput: 0: 2185.8. Samples: 315780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:55:23,891][00031] Avg episode reward: [(0, '9.882')]
	[2025-03-20 23:55:23,940][00196] Saving new best policy, reward=9.882!
	[2025-03-20 23:55:23,942][00209] Updated weights for policy 0, policy_version 310 (0.0014)
	[2025-03-20 23:55:28,487][00209] Updated weights for policy 0, policy_version 320 (0.0017)
	[2025-03-20 23:55:28,890][00031] Fps is (10 sec: 8601.3, 60 sec: 8669.8, 300 sec: 8456.2). Total num frames: 1310720. Throughput: 0: 2188.0. Samples: 322464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:55:28,891][00031] Avg episode reward: [(0, '9.444')]
	[2025-03-20 23:55:33,125][00209] Updated weights for policy 0, policy_version 330 (0.0016)
	[2025-03-20 23:55:33,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8473.6). Total num frames: 1355776. Throughput: 0: 2181.2. Samples: 335790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:55:33,891][00031] Avg episode reward: [(0, '8.868')]
	[2025-03-20 23:55:37,741][00209] Updated weights for policy 0, policy_version 340 (0.0015)
	[2025-03-20 23:55:38,890][00031] Fps is (10 sec: 8601.8, 60 sec: 8738.1, 300 sec: 8465.1). Total num frames: 1396736. Throughput: 0: 2184.3. Samples: 349194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-20 23:55:38,891][00031] Avg episode reward: [(0, '9.003')]
	[2025-03-20 23:55:42,967][00209] Updated weights for policy 0, policy_version 350 (0.0019)
	[2025-03-20 23:55:43,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8457.0). Total num frames: 1437696. Throughput: 0: 2152.1. Samples: 354404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:55:43,892][00031] Avg episode reward: [(0, '8.760')]
	[2025-03-20 23:55:47,523][00209] Updated weights for policy 0, policy_version 360 (0.0017)
	[2025-03-20 23:55:48,890][00031] Fps is (10 sec: 8601.3, 60 sec: 8669.8, 300 sec: 8472.9). Total num frames: 1482752. Throughput: 0: 2160.4. Samples: 367914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:55:48,893][00031] Avg episode reward: [(0, '9.336')]
	[2025-03-20 23:55:52,207][00209] Updated weights for policy 0, policy_version 370 (0.0015)
	[2025-03-20 23:55:53,890][00031] Fps is (10 sec: 9010.9, 60 sec: 8669.8, 300 sec: 8487.8). Total num frames: 1527808. Throughput: 0: 2195.0. Samples: 381154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:55:53,892][00031] Avg episode reward: [(0, '9.449')]
	[2025-03-20 23:55:56,655][00209] Updated weights for policy 0, policy_version 380 (0.0018)
	[2025-03-20 23:55:58,890][00031] Fps is (10 sec: 9011.5, 60 sec: 8669.9, 300 sec: 8502.0). Total num frames: 1572864. Throughput: 0: 2200.8. Samples: 387934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:55:58,893][00031] Avg episode reward: [(0, '10.815')]
	[2025-03-20 23:55:58,894][00196] Saving new best policy, reward=10.815!
	[2025-03-20 23:56:01,423][00209] Updated weights for policy 0, policy_version 390 (0.0015)
	[2025-03-20 23:56:03,890][00031] Fps is (10 sec: 9011.4, 60 sec: 8738.2, 300 sec: 8515.4). Total num frames: 1617920. Throughput: 0: 2193.2. Samples: 401198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:56:03,892][00031] Avg episode reward: [(0, '12.911')]
	[2025-03-20 23:56:03,902][00196] Saving new best policy, reward=12.911!
	[2025-03-20 23:56:06,074][00209] Updated weights for policy 0, policy_version 400 (0.0019)
	[2025-03-20 23:56:08,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8528.1). Total num frames: 1662976. Throughput: 0: 2191.9. Samples: 414414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-20 23:56:08,892][00031] Avg episode reward: [(0, '13.627')]
	[2025-03-20 23:56:08,893][00196] Saving new best policy, reward=13.627!
	[2025-03-20 23:56:10,793][00209] Updated weights for policy 0, policy_version 410 (0.0015)
	[2025-03-20 23:56:13,890][00031] Fps is (10 sec: 8192.1, 60 sec: 8669.9, 300 sec: 8499.2). Total num frames: 1699840. Throughput: 0: 2182.9. Samples: 420694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:56:13,891][00031] Avg episode reward: [(0, '13.192')]
	[2025-03-20 23:56:15,967][00209] Updated weights for policy 0, policy_version 420 (0.0018)
	[2025-03-20 23:56:18,890][00031] Fps is (10 sec: 8191.8, 60 sec: 8669.8, 300 sec: 8511.7). Total num frames: 1744896. Throughput: 0: 2159.7. Samples: 432978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:56:18,893][00031] Avg episode reward: [(0, '12.600')]
	[2025-03-20 23:56:20,412][00209] Updated weights for policy 0, policy_version 430 (0.0018)
	[2025-03-20 23:56:23,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8523.6). Total num frames: 1789952. Throughput: 0: 2161.6. Samples: 446468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:56:23,891][00031] Avg episode reward: [(0, '13.233')]
	[2025-03-20 23:56:25,083][00209] Updated weights for policy 0, policy_version 440 (0.0019)
	[2025-03-20 23:56:28,890][00031] Fps is (10 sec: 9011.4, 60 sec: 8738.2, 300 sec: 8534.9). Total num frames: 1835008. Throughput: 0: 2197.2. Samples: 453276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:56:28,894][00031] Avg episode reward: [(0, '14.886')]
	[2025-03-20 23:56:28,896][00196] Saving new best policy, reward=14.886!
	[2025-03-20 23:56:29,641][00209] Updated weights for policy 0, policy_version 450 (0.0020)
	[2025-03-20 23:56:33,890][00031] Fps is (10 sec: 9011.1, 60 sec: 8738.1, 300 sec: 8545.7). Total num frames: 1880064. Throughput: 0: 2193.6. Samples: 466624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:56:33,893][00031] Avg episode reward: [(0, '14.378')]
	[2025-03-20 23:56:33,907][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_1880064.pth...
	[2025-03-20 23:56:34,225][00209] Updated weights for policy 0, policy_version 460 (0.0021)
	[2025-03-20 23:56:38,841][00209] Updated weights for policy 0, policy_version 470 (0.0018)
	[2025-03-20 23:56:38,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8556.1). Total num frames: 1925120. Throughput: 0: 2193.9. Samples: 479878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:56:38,892][00031] Avg episode reward: [(0, '14.785')]
	[2025-03-20 23:56:43,539][00209] Updated weights for policy 0, policy_version 480 (0.0018)
	[2025-03-20 23:56:43,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8806.4, 300 sec: 8548.2). Total num frames: 1966080. Throughput: 0: 2190.4. Samples: 486502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:56:43,894][00031] Avg episode reward: [(0, '15.454')]
	[2025-03-20 23:56:43,907][00196] Saving new best policy, reward=15.454!
	[2025-03-20 23:56:48,608][00209] Updated weights for policy 0, policy_version 490 (0.0016)
	[2025-03-20 23:56:48,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8738.2, 300 sec: 8540.6). Total num frames: 2007040. Throughput: 0: 2162.8. Samples: 498522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-20 23:56:48,891][00031] Avg episode reward: [(0, '16.388')]
	[2025-03-20 23:56:48,893][00196] Saving new best policy, reward=16.388!
	[2025-03-20 23:56:53,287][00209] Updated weights for policy 0, policy_version 500 (0.0018)
	[2025-03-20 23:56:53,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.2, 300 sec: 8550.4). Total num frames: 2052096. Throughput: 0: 2164.8. Samples: 511828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:56:53,891][00031] Avg episode reward: [(0, '16.824')]
	[2025-03-20 23:56:53,898][00196] Saving new best policy, reward=16.824!
	[2025-03-20 23:56:57,932][00209] Updated weights for policy 0, policy_version 510 (0.0017)
	[2025-03-20 23:56:58,890][00031] Fps is (10 sec: 9010.7, 60 sec: 8738.1, 300 sec: 8559.8). Total num frames: 2097152. Throughput: 0: 2175.8. Samples: 518604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:56:58,893][00031] Avg episode reward: [(0, '17.144')]
	[2025-03-20 23:56:58,898][00196] Saving new best policy, reward=17.144!
	[2025-03-20 23:57:02,564][00209] Updated weights for policy 0, policy_version 520 (0.0017)
	[2025-03-20 23:57:03,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8552.4). Total num frames: 2138112. Throughput: 0: 2196.2. Samples: 531806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:57:03,892][00031] Avg episode reward: [(0, '17.762')]
	[2025-03-20 23:57:03,903][00196] Saving new best policy, reward=17.762!
	[2025-03-20 23:57:07,113][00209] Updated weights for policy 0, policy_version 530 (0.0015)
	[2025-03-20 23:57:08,890][00031] Fps is (10 sec: 8602.0, 60 sec: 8669.9, 300 sec: 8561.4). Total num frames: 2183168. Throughput: 0: 2193.9. Samples: 545194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:57:08,893][00031] Avg episode reward: [(0, '17.381')]
	[2025-03-20 23:57:11,879][00209] Updated weights for policy 0, policy_version 540 (0.0015)
	[2025-03-20 23:57:13,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8570.1). Total num frames: 2228224. Throughput: 0: 2186.3. Samples: 551658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0)
	[2025-03-20 23:57:13,893][00031] Avg episode reward: [(0, '16.917')]
	[2025-03-20 23:57:16,463][00209] Updated weights for policy 0, policy_version 550 (0.0017)
	[2025-03-20 23:57:18,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.2, 300 sec: 8563.0). Total num frames: 2269184. Throughput: 0: 2175.1. Samples: 564504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:57:18,892][00031] Avg episode reward: [(0, '16.238')]
	[2025-03-20 23:57:21,613][00209] Updated weights for policy 0, policy_version 560 (0.0017)
	[2025-03-20 23:57:23,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8571.3). Total num frames: 2314240. Throughput: 0: 2158.6. Samples: 577014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0)
	[2025-03-20 23:57:23,892][00031] Avg episode reward: [(0, '18.369')]
	[2025-03-20 23:57:23,900][00196] Saving new best policy, reward=18.369!
	[2025-03-20 23:57:26,205][00209] Updated weights for policy 0, policy_version 570 (0.0015)
	[2025-03-20 23:57:28,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8564.4). Total num frames: 2355200. Throughput: 0: 2160.6. Samples: 583730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:57:28,892][00031] Avg episode reward: [(0, '19.551')]
	[2025-03-20 23:57:28,894][00196] Saving new best policy, reward=19.551!
	[2025-03-20 23:57:30,803][00209] Updated weights for policy 0, policy_version 580 (0.0022)
	[2025-03-20 23:57:33,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8572.3). Total num frames: 2400256. Throughput: 0: 2190.8. Samples: 597106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:57:33,891][00031] Avg episode reward: [(0, '16.258')]
	[2025-03-20 23:57:35,353][00209] Updated weights for policy 0, policy_version 590 (0.0016)
	[2025-03-20 23:57:38,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8580.0). Total num frames: 2445312. Throughput: 0: 2196.6. Samples: 610674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:57:38,892][00031] Avg episode reward: [(0, '17.396')]
	[2025-03-20 23:57:39,921][00209] Updated weights for policy 0, policy_version 600 (0.0015)
	[2025-03-20 23:57:43,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8587.5). Total num frames: 2490368. Throughput: 0: 2193.1. Samples: 617294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:57:43,892][00031] Avg episode reward: [(0, '19.002')]
	[2025-03-20 23:57:44,471][00209] Updated weights for policy 0, policy_version 610 (0.0019)
	[2025-03-20 23:57:48,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8594.7). Total num frames: 2535424. Throughput: 0: 2199.3. Samples: 630776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:57:48,891][00031] Avg episode reward: [(0, '18.454')]
	[2025-03-20 23:57:49,083][00209] Updated weights for policy 0, policy_version 620 (0.0017)
	[2025-03-20 23:57:53,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 8691.9). Total num frames: 2576384. Throughput: 0: 2163.7. Samples: 642562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:57:53,891][00031] Avg episode reward: [(0, '19.292')]
	[2025-03-20 23:57:54,296][00209] Updated weights for policy 0, policy_version 630 (0.0015)
	[2025-03-20 23:57:58,876][00209] Updated weights for policy 0, policy_version 640 (0.0015)
	[2025-03-20 23:57:58,892][00031] Fps is (10 sec: 8599.4, 60 sec: 8737.8, 300 sec: 8719.5). Total num frames: 2621440. Throughput: 0: 2171.7. Samples: 649390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:57:58,894][00031] Avg episode reward: [(0, '20.406')]
	[2025-03-20 23:57:58,899][00196] Saving new best policy, reward=20.406!
	[2025-03-20 23:58:03,580][00209] Updated weights for policy 0, policy_version 650 (0.0017)
	[2025-03-20 23:58:03,890][00031] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 2662400. Throughput: 0: 2181.0. Samples: 662650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:03,891][00031] Avg episode reward: [(0, '22.802')]
	[2025-03-20 23:58:03,902][00196] Saving new best policy, reward=22.802!
	[2025-03-20 23:58:08,216][00209] Updated weights for policy 0, policy_version 660 (0.0017)
	[2025-03-20 23:58:08,890][00031] Fps is (10 sec: 8603.8, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 2707456. Throughput: 0: 2200.3. Samples: 676026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:08,893][00031] Avg episode reward: [(0, '22.575')]
	[2025-03-20 23:58:12,783][00209] Updated weights for policy 0, policy_version 670 (0.0016)
	[2025-03-20 23:58:13,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 2752512. Throughput: 0: 2195.7. Samples: 682538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:58:13,891][00031] Avg episode reward: [(0, '22.777')]
	[2025-03-20 23:58:17,311][00209] Updated weights for policy 0, policy_version 680 (0.0016)
	[2025-03-20 23:58:18,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8719.6). Total num frames: 2797568. Throughput: 0: 2200.3. Samples: 696118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:18,891][00031] Avg episode reward: [(0, '21.434')]
	[2025-03-20 23:58:21,942][00209] Updated weights for policy 0, policy_version 690 (0.0017)
	[2025-03-20 23:58:23,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 2838528. Throughput: 0: 2178.3. Samples: 708698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:58:23,891][00031] Avg episode reward: [(0, '23.660')]
	[2025-03-20 23:58:23,901][00196] Saving new best policy, reward=23.660!
	[2025-03-20 23:58:27,108][00209] Updated weights for policy 0, policy_version 700 (0.0018)
	[2025-03-20 23:58:28,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 2879488. Throughput: 0: 2167.2. Samples: 714820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:28,891][00031] Avg episode reward: [(0, '23.117')]
	[2025-03-20 23:58:31,697][00209] Updated weights for policy 0, policy_version 710 (0.0018)
	[2025-03-20 23:58:33,890][00031] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 2924544. Throughput: 0: 2163.2. Samples: 728122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:58:33,891][00031] Avg episode reward: [(0, '22.923')]
	[2025-03-20 23:58:33,902][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000714_2924544.pth...
	[2025-03-20 23:58:34,003][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth
	[2025-03-20 23:58:36,269][00209] Updated weights for policy 0, policy_version 720 (0.0014)
	[2025-03-20 23:58:38,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8733.5). Total num frames: 2969600. Throughput: 0: 2200.7. Samples: 741594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:38,891][00031] Avg episode reward: [(0, '22.964')]
	[2025-03-20 23:58:40,971][00209] Updated weights for policy 0, policy_version 730 (0.0020)
	[2025-03-20 23:58:43,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8733.5). Total num frames: 3014656. Throughput: 0: 2193.5. Samples: 748092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-20 23:58:43,892][00031] Avg episode reward: [(0, '23.219')]
	[2025-03-20 23:58:45,580][00209] Updated weights for policy 0, policy_version 740 (0.0018)
	[2025-03-20 23:58:48,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8733.5). Total num frames: 3059712. Throughput: 0: 2195.2. Samples: 761432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:58:48,891][00031] Avg episode reward: [(0, '23.719')]
	[2025-03-20 23:58:48,895][00196] Saving new best policy, reward=23.719!
	[2025-03-20 23:58:50,278][00209] Updated weights for policy 0, policy_version 750 (0.0019)
	[2025-03-20 23:58:53,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 3100672. Throughput: 0: 2188.4. Samples: 774506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:58:53,892][00031] Avg episode reward: [(0, '22.639')]
	[2025-03-20 23:58:55,247][00209] Updated weights for policy 0, policy_version 760 (0.0016)
	[2025-03-20 23:58:58,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8670.2, 300 sec: 8733.5). Total num frames: 3141632. Throughput: 0: 2166.0. Samples: 780008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:58:58,892][00031] Avg episode reward: [(0, '21.204')]
	[2025-03-20 23:59:00,219][00209] Updated weights for policy 0, policy_version 770 (0.0018)
	[2025-03-20 23:59:03,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3182592. Throughput: 0: 2150.4. Samples: 792886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:59:03,892][00031] Avg episode reward: [(0, '19.295')]
	[2025-03-20 23:59:04,761][00209] Updated weights for policy 0, policy_version 780 (0.0016)
	[2025-03-20 23:59:08,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8733.5). Total num frames: 3231744. Throughput: 0: 2171.4. Samples: 806410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:59:08,891][00031] Avg episode reward: [(0, '19.920')]
	[2025-03-20 23:59:09,491][00209] Updated weights for policy 0, policy_version 790 (0.0017)
	[2025-03-20 23:59:13,890][00031] Fps is (10 sec: 9011.1, 60 sec: 8669.8, 300 sec: 8719.6). Total num frames: 3272704. Throughput: 0: 2181.6. Samples: 812994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:59:13,892][00031] Avg episode reward: [(0, '22.246')]
	[2025-03-20 23:59:13,973][00209] Updated weights for policy 0, policy_version 800 (0.0015)
	[2025-03-20 23:59:18,496][00209] Updated weights for policy 0, policy_version 810 (0.0017)
	[2025-03-20 23:59:18,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3317760. Throughput: 0: 2187.4. Samples: 826554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-20 23:59:18,891][00031] Avg episode reward: [(0, '24.452')]
	[2025-03-20 23:59:18,927][00196] Saving new best policy, reward=24.452!
	[2025-03-20 23:59:23,206][00209] Updated weights for policy 0, policy_version 820 (0.0019)
	[2025-03-20 23:59:23,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 3362816. Throughput: 0: 2182.0. Samples: 839784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-20 23:59:23,891][00031] Avg episode reward: [(0, '23.698')]
	[2025-03-20 23:59:28,103][00209] Updated weights for policy 0, policy_version 830 (0.0018)
	[2025-03-20 23:59:28,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 3403776. Throughput: 0: 2186.6. Samples: 846490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-20 23:59:28,895][00031] Avg episode reward: [(0, '23.043')]
	[2025-03-20 23:59:33,012][00209] Updated weights for policy 0, policy_version 840 (0.0020)
	[2025-03-20 23:59:33,890][00031] Fps is (10 sec: 8601.4, 60 sec: 8738.1, 300 sec: 8733.5). Total num frames: 3448832. Throughput: 0: 2154.5. Samples: 858384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:59:33,892][00031] Avg episode reward: [(0, '22.872')]
	[2025-03-20 23:59:37,599][00209] Updated weights for policy 0, policy_version 850 (0.0018)
	[2025-03-20 23:59:38,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3489792. Throughput: 0: 2162.5. Samples: 871818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-20 23:59:38,892][00031] Avg episode reward: [(0, '21.611')]
	[2025-03-20 23:59:42,356][00209] Updated weights for policy 0, policy_version 860 (0.0020)
	[2025-03-20 23:59:43,890][00031] Fps is (10 sec: 8601.8, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3534848. Throughput: 0: 2181.8. Samples: 878188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:59:43,891][00031] Avg episode reward: [(0, '19.733')]
	[2025-03-20 23:59:47,120][00209] Updated weights for policy 0, policy_version 870 (0.0015)
	[2025-03-20 23:59:48,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8705.7). Total num frames: 3575808. Throughput: 0: 2187.0. Samples: 891302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:59:48,891][00031] Avg episode reward: [(0, '17.790')]
	[2025-03-20 23:59:51,811][00209] Updated weights for policy 0, policy_version 880 (0.0017)
	[2025-03-20 23:59:53,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 3620864. Throughput: 0: 2175.7. Samples: 904316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-20 23:59:53,892][00031] Avg episode reward: [(0, '21.014')]
	[2025-03-20 23:59:56,362][00209] Updated weights for policy 0, policy_version 890 (0.0016)
	[2025-03-20 23:59:58,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 3665920. Throughput: 0: 2177.4. Samples: 910978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-20 23:59:58,894][00031] Avg episode reward: [(0, '24.367')]
	[2025-03-21 00:00:01,498][00209] Updated weights for policy 0, policy_version 900 (0.0017)
	[2025-03-21 00:00:03,890][00031] Fps is (10 sec: 8191.9, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 3702784. Throughput: 0: 2142.4. Samples: 922960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:00:03,893][00031] Avg episode reward: [(0, '26.571')]
	[2025-03-21 00:00:03,902][00196] Saving new best policy, reward=26.571!
	[2025-03-21 00:00:06,280][00209] Updated weights for policy 0, policy_version 910 (0.0017)
	[2025-03-21 00:00:08,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8705.7). Total num frames: 3747840. Throughput: 0: 2142.2. Samples: 936184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:00:08,891][00031] Avg episode reward: [(0, '26.168')]
	[2025-03-21 00:00:10,874][00209] Updated weights for policy 0, policy_version 920 (0.0019)
	[2025-03-21 00:00:13,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 3792896. Throughput: 0: 2138.9. Samples: 942740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 3.0)
	[2025-03-21 00:00:13,891][00031] Avg episode reward: [(0, '24.613')]
	[2025-03-21 00:00:15,418][00209] Updated weights for policy 0, policy_version 930 (0.0016)
	[2025-03-21 00:00:18,890][00031] Fps is (10 sec: 9011.1, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 3837952. Throughput: 0: 2174.3. Samples: 956226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 00:00:18,892][00031] Avg episode reward: [(0, '26.720')]
	[2025-03-21 00:00:18,895][00196] Saving new best policy, reward=26.720!
	[2025-03-21 00:00:20,113][00209] Updated weights for policy 0, policy_version 940 (0.0016)
	[2025-03-21 00:00:23,891][00031] Fps is (10 sec: 9010.5, 60 sec: 8669.7, 300 sec: 8719.6). Total num frames: 3883008. Throughput: 0: 2169.3. Samples: 969438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 00:00:23,892][00031] Avg episode reward: [(0, '25.791')]
	[2025-03-21 00:00:24,729][00209] Updated weights for policy 0, policy_version 950 (0.0016)
	[2025-03-21 00:00:28,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 3928064. Throughput: 0: 2177.3. Samples: 976166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:00:28,894][00031] Avg episode reward: [(0, '25.840')]
	[2025-03-21 00:00:29,403][00209] Updated weights for policy 0, policy_version 960 (0.0019)
	[2025-03-21 00:00:33,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8669.8, 300 sec: 8719.6). Total num frames: 3969024. Throughput: 0: 2180.4. Samples: 989422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:00:33,893][00031] Avg episode reward: [(0, '26.366')]
	[2025-03-21 00:00:33,902][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth...
	[2025-03-21 00:00:33,996][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_1880064.pth
	[2025-03-21 00:00:34,571][00209] Updated weights for policy 0, policy_version 970 (0.0015)
	[2025-03-21 00:00:38,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8719.6). Total num frames: 4009984. Throughput: 0: 2155.2. Samples: 1001298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:00:38,891][00031] Avg episode reward: [(0, '25.201')]
	[2025-03-21 00:00:39,210][00209] Updated weights for policy 0, policy_version 980 (0.0018)
	[2025-03-21 00:00:43,890][00031] Fps is (10 sec: 8192.6, 60 sec: 8601.6, 300 sec: 8705.7). Total num frames: 4050944. Throughput: 0: 2153.6. Samples: 1007892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:00:43,892][00031] Avg episode reward: [(0, '25.953')]
	[2025-03-21 00:00:43,918][00209] Updated weights for policy 0, policy_version 990 (0.0015)
	[2025-03-21 00:00:48,415][00209] Updated weights for policy 0, policy_version 1000 (0.0016)
	[2025-03-21 00:00:48,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 4100096. Throughput: 0: 2187.6. Samples: 1021400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:00:48,894][00031] Avg episode reward: [(0, '25.095')]
	[2025-03-21 00:00:53,041][00209] Updated weights for policy 0, policy_version 1010 (0.0016)
	[2025-03-21 00:00:53,891][00031] Fps is (10 sec: 9419.3, 60 sec: 8737.9, 300 sec: 8719.6). Total num frames: 4145152. Throughput: 0: 2191.3. Samples: 1034798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:00:53,893][00031] Avg episode reward: [(0, '26.122')]
	[2025-03-21 00:00:57,515][00209] Updated weights for policy 0, policy_version 1020 (0.0015)
	[2025-03-21 00:00:58,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8738.2, 300 sec: 8719.6). Total num frames: 4190208. Throughput: 0: 2195.9. Samples: 1041556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:00:58,891][00031] Avg episode reward: [(0, '27.882')]
	[2025-03-21 00:00:58,897][00196] Saving new best policy, reward=27.882!
	[2025-03-21 00:01:02,208][00209] Updated weights for policy 0, policy_version 1030 (0.0016)
	[2025-03-21 00:01:03,890][00031] Fps is (10 sec: 8602.9, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 4231168. Throughput: 0: 2191.6. Samples: 1054848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:01:03,892][00031] Avg episode reward: [(0, '29.140')]
	[2025-03-21 00:01:03,900][00196] Saving new best policy, reward=29.140!
	[2025-03-21 00:01:07,419][00209] Updated weights for policy 0, policy_version 1040 (0.0019)
	[2025-03-21 00:01:08,890][00031] Fps is (10 sec: 8191.9, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 4272128. Throughput: 0: 2160.8. Samples: 1066674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 00:01:08,894][00031] Avg episode reward: [(0, '27.887')]
	[2025-03-21 00:01:12,087][00209] Updated weights for policy 0, policy_version 1050 (0.0018)
	[2025-03-21 00:01:13,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 4313088. Throughput: 0: 2157.5. Samples: 1073254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 00:01:13,892][00031] Avg episode reward: [(0, '24.894')]
	[2025-03-21 00:01:16,687][00209] Updated weights for policy 0, policy_version 1060 (0.0018)
	[2025-03-21 00:01:18,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 4358144. Throughput: 0: 2160.6. Samples: 1086648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:01:18,894][00031] Avg episode reward: [(0, '23.185')]
	[2025-03-21 00:01:21,376][00209] Updated weights for policy 0, policy_version 1070 (0.0018)
	[2025-03-21 00:01:23,892][00031] Fps is (10 sec: 9008.9, 60 sec: 8669.6, 300 sec: 8705.7). Total num frames: 4403200. Throughput: 0: 2191.8. Samples: 1099934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:01:23,894][00031] Avg episode reward: [(0, '26.089')]
	[2025-03-21 00:01:25,877][00209] Updated weights for policy 0, policy_version 1080 (0.0018)
	[2025-03-21 00:01:28,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 4448256. Throughput: 0: 2194.6. Samples: 1106650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:01:28,891][00031] Avg episode reward: [(0, '26.792')]
	[2025-03-21 00:01:30,500][00209] Updated weights for policy 0, policy_version 1090 (0.0018)
	[2025-03-21 00:01:33,890][00031] Fps is (10 sec: 9013.0, 60 sec: 8738.2, 300 sec: 8705.7). Total num frames: 4493312. Throughput: 0: 2190.3. Samples: 1119964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 00:01:33,893][00031] Avg episode reward: [(0, '27.977')]
	[2025-03-21 00:01:35,076][00209] Updated weights for policy 0, policy_version 1100 (0.0018)
	[2025-03-21 00:01:38,891][00031] Fps is (10 sec: 8600.4, 60 sec: 8737.9, 300 sec: 8705.7). Total num frames: 4534272. Throughput: 0: 2189.0. Samples: 1133304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:01:38,894][00031] Avg episode reward: [(0, '28.550')]
	[2025-03-21 00:01:40,332][00209] Updated weights for policy 0, policy_version 1110 (0.0021)
	[2025-03-21 00:01:43,890][00031] Fps is (10 sec: 8192.3, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 4575232. Throughput: 0: 2155.8. Samples: 1138566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:01:43,892][00031] Avg episode reward: [(0, '28.701')]
	[2025-03-21 00:01:44,888][00209] Updated weights for policy 0, policy_version 1120 (0.0016)
	[2025-03-21 00:01:48,890][00031] Fps is (10 sec: 8602.4, 60 sec: 8669.8, 300 sec: 8705.7). Total num frames: 4620288. Throughput: 0: 2160.4. Samples: 1152066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:01:48,892][00031] Avg episode reward: [(0, '25.380')]
	[2025-03-21 00:01:49,565][00209] Updated weights for policy 0, policy_version 1130 (0.0017)
	[2025-03-21 00:01:53,890][00031] Fps is (10 sec: 9011.4, 60 sec: 8670.1, 300 sec: 8705.8). Total num frames: 4665344. Throughput: 0: 2189.7. Samples: 1165212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 00:01:53,891][00031] Avg episode reward: [(0, '24.286')]
	[2025-03-21 00:01:54,227][00209] Updated weights for policy 0, policy_version 1140 (0.0014)
	[2025-03-21 00:01:58,758][00209] Updated weights for policy 0, policy_version 1150 (0.0016)
	[2025-03-21 00:01:58,890][00031] Fps is (10 sec: 9011.4, 60 sec: 8669.8, 300 sec: 8719.6). Total num frames: 4710400. Throughput: 0: 2192.9. Samples: 1171936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 00:01:58,892][00031] Avg episode reward: [(0, '26.960')]
	[2025-03-21 00:02:03,439][00209] Updated weights for policy 0, policy_version 1160 (0.0015)
	[2025-03-21 00:02:03,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 4751360. Throughput: 0: 2187.5. Samples: 1185084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:03,892][00031] Avg episode reward: [(0, '30.019')]
	[2025-03-21 00:02:03,900][00196] Saving new best policy, reward=30.019!
	[2025-03-21 00:02:08,171][00209] Updated weights for policy 0, policy_version 1170 (0.0018)
	[2025-03-21 00:02:08,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 4796416. Throughput: 0: 2186.2. Samples: 1198308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 00:02:08,892][00031] Avg episode reward: [(0, '28.972')]
	[2025-03-21 00:02:13,363][00209] Updated weights for policy 0, policy_version 1180 (0.0018)
	[2025-03-21 00:02:13,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 4837376. Throughput: 0: 2176.4. Samples: 1204590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:13,891][00031] Avg episode reward: [(0, '25.194')]
	[2025-03-21 00:02:17,927][00209] Updated weights for policy 0, policy_version 1190 (0.0015)
	[2025-03-21 00:02:18,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 4882432. Throughput: 0: 2156.2. Samples: 1216994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 3.0)
	[2025-03-21 00:02:18,892][00031] Avg episode reward: [(0, '23.589')]
	[2025-03-21 00:02:22,520][00209] Updated weights for policy 0, policy_version 1200 (0.0020)
	[2025-03-21 00:02:23,890][00031] Fps is (10 sec: 9011.0, 60 sec: 8738.5, 300 sec: 8719.6). Total num frames: 4927488. Throughput: 0: 2157.2. Samples: 1230376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:23,892][00031] Avg episode reward: [(0, '23.189')]
	[2025-03-21 00:02:27,084][00209] Updated weights for policy 0, policy_version 1210 (0.0017)
	[2025-03-21 00:02:28,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8719.6). Total num frames: 4972544. Throughput: 0: 2190.6. Samples: 1237144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:28,891][00031] Avg episode reward: [(0, '25.520')]
	[2025-03-21 00:02:31,795][00209] Updated weights for policy 0, policy_version 1220 (0.0016)
	[2025-03-21 00:02:33,890][00031] Fps is (10 sec: 8601.8, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 5013504. Throughput: 0: 2183.5. Samples: 1250322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 00:02:33,891][00031] Avg episode reward: [(0, '25.585')]
	[2025-03-21 00:02:33,901][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001224_5013504.pth...
	[2025-03-21 00:02:33,900][00031] Components not started: RolloutWorker_w5, wait_time=600.0 seconds
	[2025-03-21 00:02:34,000][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000714_2924544.pth
	[2025-03-21 00:02:36,344][00209] Updated weights for policy 0, policy_version 1230 (0.0019)
	[2025-03-21 00:02:38,890][00031] Fps is (10 sec: 8601.5, 60 sec: 8738.3, 300 sec: 8705.7). Total num frames: 5058560. Throughput: 0: 2187.8. Samples: 1263662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:38,892][00031] Avg episode reward: [(0, '26.212')]
	[2025-03-21 00:02:40,957][00209] Updated weights for policy 0, policy_version 1240 (0.0017)
	[2025-03-21 00:02:43,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8738.2, 300 sec: 8691.8). Total num frames: 5099520. Throughput: 0: 2184.8. Samples: 1270252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:02:43,895][00031] Avg episode reward: [(0, '27.944')]
	[2025-03-21 00:02:46,282][00209] Updated weights for policy 0, policy_version 1250 (0.0016)
	[2025-03-21 00:02:48,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 5140480. Throughput: 0: 2157.2. Samples: 1282160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:48,891][00031] Avg episode reward: [(0, '28.365')]
	[2025-03-21 00:02:50,953][00209] Updated weights for policy 0, policy_version 1260 (0.0018)
	[2025-03-21 00:02:53,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 5185536. Throughput: 0: 2156.5. Samples: 1295350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:02:53,892][00031] Avg episode reward: [(0, '27.965')]
	[2025-03-21 00:02:55,486][00209] Updated weights for policy 0, policy_version 1270 (0.0017)
	[2025-03-21 00:02:58,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 5230592. Throughput: 0: 2167.2. Samples: 1302116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:02:58,892][00031] Avg episode reward: [(0, '27.812')]
	[2025-03-21 00:03:00,065][00209] Updated weights for policy 0, policy_version 1280 (0.0017)
	[2025-03-21 00:03:03,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 5275648. Throughput: 0: 2187.2. Samples: 1315418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:03:03,893][00031] Avg episode reward: [(0, '25.359')]
	[2025-03-21 00:03:04,783][00209] Updated weights for policy 0, policy_version 1290 (0.0018)
	[2025-03-21 00:03:08,890][00031] Fps is (10 sec: 9011.3, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 5320704. Throughput: 0: 2187.6. Samples: 1328818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:03:08,893][00031] Avg episode reward: [(0, '23.290')]
	[2025-03-21 00:03:09,429][00209] Updated weights for policy 0, policy_version 1300 (0.0017)
	[2025-03-21 00:03:13,890][00031] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 8691.9). Total num frames: 5361664. Throughput: 0: 2183.9. Samples: 1335418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:03:13,891][00031] Avg episode reward: [(0, '23.815')]
	[2025-03-21 00:03:13,936][00209] Updated weights for policy 0, policy_version 1310 (0.0015)
	[2025-03-21 00:03:18,890][00031] Fps is (10 sec: 8192.0, 60 sec: 8669.9, 300 sec: 8691.9). Total num frames: 5402624. Throughput: 0: 2173.2. Samples: 1348116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:03:18,892][00031] Avg episode reward: [(0, '22.840')]
	[2025-03-21 00:03:19,197][00209] Updated weights for policy 0, policy_version 1320 (0.0021)
	[2025-03-21 00:03:23,685][00209] Updated weights for policy 0, policy_version 1330 (0.0017)
	[2025-03-21 00:03:23,890][00031] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 5447680. Throughput: 0: 2157.0. Samples: 1360728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:03:23,891][00031] Avg episode reward: [(0, '25.047')]
	[2025-03-21 00:03:28,336][00209] Updated weights for policy 0, policy_version 1340 (0.0017)
	[2025-03-21 00:03:28,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8705.7). Total num frames: 5492736. Throughput: 0: 2161.6. Samples: 1367522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:03:28,891][00031] Avg episode reward: [(0, '27.608')]
	[2025-03-21 00:03:32,931][00209] Updated weights for policy 0, policy_version 1350 (0.0017)
	[2025-03-21 00:03:33,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 5537792. Throughput: 0: 2193.4. Samples: 1380862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:03:33,892][00031] Avg episode reward: [(0, '27.232')]
	[2025-03-21 00:03:37,458][00209] Updated weights for policy 0, policy_version 1360 (0.0017)
	[2025-03-21 00:03:38,890][00031] Fps is (10 sec: 9010.9, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 5582848. Throughput: 0: 2199.8. Samples: 1394340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:03:38,892][00031] Avg episode reward: [(0, '23.707')]
	[2025-03-21 00:03:42,131][00209] Updated weights for policy 0, policy_version 1370 (0.0018)
	[2025-03-21 00:03:43,890][00031] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 8691.8). Total num frames: 5623808. Throughput: 0: 2195.7. Samples: 1400922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:03:43,891][00031] Avg episode reward: [(0, '22.824')]
	[2025-03-21 00:03:46,674][00209] Updated weights for policy 0, policy_version 1380 (0.0017)
	[2025-03-21 00:03:48,890][00031] Fps is (10 sec: 8601.8, 60 sec: 8806.4, 300 sec: 8705.7). Total num frames: 5668864. Throughput: 0: 2198.8. Samples: 1414362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:03:48,892][00031] Avg episode reward: [(0, '24.680')]
	[2025-03-21 00:03:51,975][00209] Updated weights for policy 0, policy_version 1390 (0.0018)
	[2025-03-21 00:03:53,890][00031] Fps is (10 sec: 8601.2, 60 sec: 8738.1, 300 sec: 8705.7). Total num frames: 5709824. Throughput: 0: 2161.7. Samples: 1426098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:03:53,891][00031] Avg episode reward: [(0, '25.548')]
	[2025-03-21 00:03:56,647][00209] Updated weights for policy 0, policy_version 1400 (0.0015)
	[2025-03-21 00:03:58,890][00031] Fps is (10 sec: 8191.8, 60 sec: 8669.8, 300 sec: 8705.7). Total num frames: 5750784. Throughput: 0: 2161.3. Samples: 1432678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 00:03:58,892][00031] Avg episode reward: [(0, '26.836')]
	[2025-03-21 00:04:01,353][00209] Updated weights for policy 0, policy_version 1410 (0.0021)
	[2025-03-21 00:04:03,890][00031] Fps is (10 sec: 8602.0, 60 sec: 8669.9, 300 sec: 8691.8). Total num frames: 5795840. Throughput: 0: 2167.0. Samples: 1445630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 00:04:03,891][00031] Avg episode reward: [(0, '27.570')]
	[2025-03-21 00:04:06,186][00209] Updated weights for policy 0, policy_version 1420 (0.0017)
	[2025-03-21 00:04:08,890][00031] Fps is (10 sec: 8601.8, 60 sec: 8601.6, 300 sec: 8691.9). Total num frames: 5836800. Throughput: 0: 2177.2. Samples: 1458702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:04:08,892][00031] Avg episode reward: [(0, '27.254')]
	[2025-03-21 00:04:10,817][00209] Updated weights for policy 0, policy_version 1430 (0.0017)
	[2025-03-21 00:04:13,890][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.8, 300 sec: 8691.8). Total num frames: 5881856. Throughput: 0: 2171.7. Samples: 1465250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:04:13,891][00031] Avg episode reward: [(0, '27.562')]
	[2025-03-21 00:04:15,379][00209] Updated weights for policy 0, policy_version 1440 (0.0018)
	[2025-03-21 00:04:18,890][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8691.9). Total num frames: 5926912. Throughput: 0: 2173.6. Samples: 1478672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 00:04:18,891][00031] Avg episode reward: [(0, '29.306')]
	[2025-03-21 00:04:20,052][00209] Updated weights for policy 0, policy_version 1450 (0.0016)
	[2025-03-21 00:04:23,890][00031] Fps is (10 sec: 8601.0, 60 sec: 8669.8, 300 sec: 8691.8). Total num frames: 5967872. Throughput: 0: 2148.6. Samples: 1491028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 00:04:23,894][00031] Avg episode reward: [(0, '29.435')]
	[2025-03-21 00:04:25,197][00209] Updated weights for policy 0, policy_version 1460 (0.0023)
	[2025-03-21 00:04:27,864][00196] Stopping Batcher_0...
	[2025-03-21 00:04:27,864][00196] Loop batcher_evt_loop terminating...
	[2025-03-21 00:04:27,865][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth...
	[2025-03-21 00:04:27,864][00031] Component Batcher_0 stopped!
	[2025-03-21 00:04:27,866][00031] Component RolloutWorker_w5 process died already! Don't wait for it.
	[2025-03-21 00:04:27,898][00209] Weights refcount: 2 0
	[2025-03-21 00:04:27,900][00209] Stopping InferenceWorker_p0-w0...
	[2025-03-21 00:04:27,900][00209] Loop inference_proc0-0_evt_loop terminating...
	[2025-03-21 00:04:27,901][00031] Component InferenceWorker_p0-w0 stopped!
	[2025-03-21 00:04:27,931][00031] Component RolloutWorker_w1 stopped!
	[2025-03-21 00:04:27,934][00212] Stopping RolloutWorker_w1...
	[2025-03-21 00:04:27,935][00212] Loop rollout_proc1_evt_loop terminating...
	[2025-03-21 00:04:27,951][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth
	[2025-03-21 00:04:27,965][00196] Saving new best policy, reward=30.402!
	[2025-03-21 00:04:27,975][00216] Stopping RolloutWorker_w6...
	[2025-03-21 00:04:27,976][00216] Loop rollout_proc6_evt_loop terminating...
	[2025-03-21 00:04:27,975][00031] Component RolloutWorker_w6 stopped!
	[2025-03-21 00:04:27,980][00213] Stopping RolloutWorker_w2...
	[2025-03-21 00:04:27,980][00031] Component RolloutWorker_w2 stopped!
	[2025-03-21 00:04:27,981][00213] Loop rollout_proc2_evt_loop terminating...
	[2025-03-21 00:04:28,076][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth...
	[2025-03-21 00:04:28,123][00214] Stopping RolloutWorker_w4...
	[2025-03-21 00:04:28,124][00214] Loop rollout_proc4_evt_loop terminating...
	[2025-03-21 00:04:28,123][00031] Component RolloutWorker_w4 stopped!
	[2025-03-21 00:04:28,131][00210] Stopping RolloutWorker_w0...
	[2025-03-21 00:04:28,131][00210] Loop rollout_proc0_evt_loop terminating...
	[2025-03-21 00:04:28,131][00031] Component RolloutWorker_w0 stopped!
	[2025-03-21 00:04:28,138][00211] Stopping RolloutWorker_w3...
	[2025-03-21 00:04:28,139][00211] Loop rollout_proc3_evt_loop terminating...
	[2025-03-21 00:04:28,138][00031] Component RolloutWorker_w3 stopped!
	[2025-03-21 00:04:28,158][00031] Component RolloutWorker_w7 stopped!
	[2025-03-21 00:04:28,159][00217] Stopping RolloutWorker_w7...
	[2025-03-21 00:04:28,160][00217] Loop rollout_proc7_evt_loop terminating...
	[2025-03-21 00:04:28,197][00196] Stopping LearnerWorker_p0...
	[2025-03-21 00:04:28,198][00196] Loop learner_proc0_evt_loop terminating...
	[2025-03-21 00:04:28,197][00031] Component LearnerWorker_p0 stopped!
	[2025-03-21 00:04:28,198][00031] Waiting for process learner_proc0 to stop...
	[2025-03-21 00:04:29,505][00031] Waiting for process inference_proc0-0 to join...
	[2025-03-21 00:04:29,510][00031] Waiting for process rollout_proc0 to join...
	[2025-03-21 00:04:29,816][00031] Waiting for process rollout_proc1 to join...
	[2025-03-21 00:04:29,819][00031] Waiting for process rollout_proc2 to join...
	[2025-03-21 00:04:29,934][00031] Waiting for process rollout_proc3 to join...
	[2025-03-21 00:04:29,936][00031] Waiting for process rollout_proc4 to join...
	[2025-03-21 00:04:29,937][00031] Waiting for process rollout_proc5 to join...
	[2025-03-21 00:04:29,938][00031] Waiting for process rollout_proc6 to join...
	[2025-03-21 00:04:29,962][00031] Waiting for process rollout_proc7 to join...
	[2025-03-21 00:04:29,963][00031] Batcher 0 profile tree view:
	batching: 36.8708, releasing_batches: 0.0387
	[2025-03-21 00:04:29,964][00031] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 18.5605
	update_model: 9.7350
	weight_update: 0.0015
	one_step: 0.0028
	handle_policy_step: 632.3479
	deserialize: 18.6638, stack: 4.1556, obs_to_device_normalize: 152.6385, forward: 301.7508, send_messages: 31.4225
	prepare_outputs: 91.5719
	to_cpu: 57.8595
	[2025-03-21 00:04:29,965][00031] Learner 0 profile tree view:
	misc: 0.0078, prepare_batch: 17.4161
	train: 73.2391
	epoch_init: 0.0087, minibatch_init: 0.0091, losses_postprocess: 0.7866, kl_divergence: 0.7831, after_optimizer: 32.9973
	calculate_losses: 24.3027
	losses_init: 0.0057, forward_head: 1.2705, bptt_initial: 16.9615, tail: 1.0647, advantages_returns: 0.2736, losses: 2.3759
	bptt: 2.0419
	bptt_forward_core: 1.9426
	update: 13.7216
	clip: 1.2010
	[2025-03-21 00:04:29,967][00031] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.3301, enqueue_policy_requests: 14.4142, env_step: 514.5388, overhead: 13.0106, complete_rollouts: 1.8910
	save_policy_outputs: 18.1673
	split_output_tensors: 7.2651
	[2025-03-21 00:04:29,967][00031] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.3332, enqueue_policy_requests: 15.0279, env_step: 495.7572, overhead: 13.8787, complete_rollouts: 2.2632
	save_policy_outputs: 18.9713
	split_output_tensors: 7.6620
	[2025-03-21 00:04:29,968][00031] Loop Runner_EvtLoop terminating...
	[2025-03-21 00:04:29,970][00031] Runner profile tree view:
	main_loop: 711.2594
	[2025-03-21 00:04:29,971][00031] Collected {0: 6004736}, FPS: 8442.4
	[2025-03-21 00:08:33,688][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json
	[2025-03-21 00:08:33,689][00031] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 00:08:33,690][00031] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 00:08:33,691][00031] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 00:08:33,691][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 00:08:33,692][00031] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 00:08:33,693][00031] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 00:08:33,694][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 00:08:33,696][00031] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 00:08:33,696][00031] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 00:08:33,697][00031] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 00:08:33,698][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 00:08:33,699][00031] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 00:08:33,700][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 00:08:33,700][00031] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 00:08:33,730][00031] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 00:08:33,733][00031] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 00:08:33,735][00031] RunningMeanStd input shape: (1,)
	[2025-03-21 00:08:33,749][00031] ConvEncoder: input_channels=3
	[2025-03-21 00:08:33,848][00031] Conv encoder output size: 512
	[2025-03-21 00:08:33,848][00031] Policy head output size: 512
	[2025-03-21 00:08:34,076][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth...
	[2025-03-21 00:08:34,917][00031] Num frames 100...
	[2025-03-21 00:08:35,051][00031] Num frames 200...
	[2025-03-21 00:08:35,204][00031] Num frames 300...
	[2025-03-21 00:08:35,334][00031] Num frames 400...
	[2025-03-21 00:08:35,472][00031] Num frames 500...
	[2025-03-21 00:08:35,606][00031] Num frames 600...
	[2025-03-21 00:08:35,720][00031] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400
	[2025-03-21 00:08:35,721][00031] Avg episode reward: 11.400, avg true_objective: 6.400
	[2025-03-21 00:08:35,804][00031] Num frames 700...
	[2025-03-21 00:08:35,939][00031] Num frames 800...
	[2025-03-21 00:08:36,074][00031] Num frames 900...
	[2025-03-21 00:08:36,188][00031] Avg episode rewards: #0: 7.705, true rewards: #0: 4.705
	[2025-03-21 00:08:36,189][00031] Avg episode reward: 7.705, avg true_objective: 4.705
	[2025-03-21 00:08:36,257][00031] Num frames 1000...
	[2025-03-21 00:08:36,371][00031] Num frames 1100...
	[2025-03-21 00:08:36,492][00031] Num frames 1200...
	[2025-03-21 00:08:36,614][00031] Num frames 1300...
	[2025-03-21 00:08:36,698][00031] Avg episode rewards: #0: 6.417, true rewards: #0: 4.417
	[2025-03-21 00:08:36,699][00031] Avg episode reward: 6.417, avg true_objective: 4.417
	[2025-03-21 00:08:36,794][00031] Num frames 1400...
	[2025-03-21 00:08:36,921][00031] Num frames 1500...
	[2025-03-21 00:08:37,037][00031] Num frames 1600...
	[2025-03-21 00:08:37,164][00031] Num frames 1700...
	[2025-03-21 00:08:37,289][00031] Num frames 1800...
	[2025-03-21 00:08:37,404][00031] Num frames 1900...
	[2025-03-21 00:08:37,527][00031] Num frames 2000...
	[2025-03-21 00:08:37,644][00031] Num frames 2100...
	[2025-03-21 00:08:37,764][00031] Num frames 2200...
	[2025-03-21 00:08:37,883][00031] Num frames 2300...
	[2025-03-21 00:08:38,006][00031] Num frames 2400...
	[2025-03-21 00:08:38,127][00031] Num frames 2500...
	[2025-03-21 00:08:38,250][00031] Num frames 2600...
	[2025-03-21 00:08:38,321][00031] Avg episode rewards: #0: 12.283, true rewards: #0: 6.532
	[2025-03-21 00:08:38,323][00031] Avg episode reward: 12.283, avg true_objective: 6.532
	[2025-03-21 00:08:38,426][00031] Num frames 2700...
	[2025-03-21 00:08:38,545][00031] Num frames 2800...
	[2025-03-21 00:08:38,671][00031] Num frames 2900...
	[2025-03-21 00:08:38,791][00031] Num frames 3000...
	[2025-03-21 00:08:38,921][00031] Num frames 3100...
	[2025-03-21 00:08:39,050][00031] Num frames 3200...
	[2025-03-21 00:08:39,194][00031] Num frames 3300...
	[2025-03-21 00:08:39,314][00031] Num frames 3400...
	[2025-03-21 00:08:39,437][00031] Num frames 3500...
	[2025-03-21 00:08:39,557][00031] Num frames 3600...
	[2025-03-21 00:08:39,675][00031] Num frames 3700...
	[2025-03-21 00:08:39,793][00031] Num frames 3800...
	[2025-03-21 00:08:39,915][00031] Num frames 3900...
	[2025-03-21 00:08:40,035][00031] Num frames 4000...
	[2025-03-21 00:08:40,159][00031] Num frames 4100...
	[2025-03-21 00:08:40,283][00031] Num frames 4200...
	[2025-03-21 00:08:40,405][00031] Num frames 4300...
	[2025-03-21 00:08:40,524][00031] Num frames 4400...
	[2025-03-21 00:08:40,646][00031] Num frames 4500...
	[2025-03-21 00:08:40,768][00031] Num frames 4600...
	[2025-03-21 00:08:40,891][00031] Num frames 4700...
	[2025-03-21 00:08:40,962][00031] Avg episode rewards: #0: 20.426, true rewards: #0: 9.426
	[2025-03-21 00:08:40,963][00031] Avg episode reward: 20.426, avg true_objective: 9.426
	[2025-03-21 00:08:41,065][00031] Num frames 4800...
	[2025-03-21 00:08:41,188][00031] Num frames 4900...
	[2025-03-21 00:08:41,313][00031] Num frames 5000...
	[2025-03-21 00:08:41,436][00031] Num frames 5100...
	[2025-03-21 00:08:41,558][00031] Num frames 5200...
	[2025-03-21 00:08:41,683][00031] Num frames 5300...
	[2025-03-21 00:08:41,811][00031] Num frames 5400...
	[2025-03-21 00:08:41,937][00031] Num frames 5500...
	[2025-03-21 00:08:42,064][00031] Num frames 5600...
	[2025-03-21 00:08:42,183][00031] Num frames 5700...
	[2025-03-21 00:08:42,303][00031] Num frames 5800...
	[2025-03-21 00:08:42,423][00031] Num frames 5900...
	[2025-03-21 00:08:42,548][00031] Num frames 6000...
	[2025-03-21 00:08:42,673][00031] Num frames 6100...
	[2025-03-21 00:08:42,801][00031] Num frames 6200...
	[2025-03-21 00:08:42,927][00031] Num frames 6300...
	[2025-03-21 00:08:43,055][00031] Num frames 6400...
	[2025-03-21 00:08:43,180][00031] Num frames 6500...
	[2025-03-21 00:08:43,307][00031] Num frames 6600...
	[2025-03-21 00:08:43,403][00031] Avg episode rewards: #0: 25.221, true rewards: #0: 11.055
	[2025-03-21 00:08:43,404][00031] Avg episode reward: 25.221, avg true_objective: 11.055
	[2025-03-21 00:08:43,481][00031] Num frames 6700...
	[2025-03-21 00:08:43,601][00031] Num frames 6800...
	[2025-03-21 00:08:43,720][00031] Num frames 6900...
	[2025-03-21 00:08:43,840][00031] Num frames 7000...
	[2025-03-21 00:08:43,966][00031] Num frames 7100...
	[2025-03-21 00:08:44,094][00031] Num frames 7200...
	[2025-03-21 00:08:44,218][00031] Num frames 7300...
	[2025-03-21 00:08:44,342][00031] Num frames 7400...
	[2025-03-21 00:08:44,466][00031] Num frames 7500...
	[2025-03-21 00:08:44,584][00031] Num frames 7600...
	[2025-03-21 00:08:44,703][00031] Num frames 7700...
	[2025-03-21 00:08:44,820][00031] Num frames 7800...
	[2025-03-21 00:08:44,935][00031] Avg episode rewards: #0: 25.356, true rewards: #0: 11.213
	[2025-03-21 00:08:44,936][00031] Avg episode reward: 25.356, avg true_objective: 11.213
	[2025-03-21 00:08:44,996][00031] Num frames 7900...
	[2025-03-21 00:08:45,115][00031] Num frames 8000...
	[2025-03-21 00:08:45,235][00031] Num frames 8100...
	[2025-03-21 00:08:45,360][00031] Num frames 8200...
	[2025-03-21 00:08:45,484][00031] Num frames 8300...
	[2025-03-21 00:08:45,612][00031] Num frames 8400...
	[2025-03-21 00:08:45,738][00031] Num frames 8500...
	[2025-03-21 00:08:45,861][00031] Num frames 8600...
	[2025-03-21 00:08:45,988][00031] Num frames 8700...
	[2025-03-21 00:08:46,113][00031] Num frames 8800...
	[2025-03-21 00:08:46,233][00031] Num frames 8900...
	[2025-03-21 00:08:46,350][00031] Num frames 9000...
	[2025-03-21 00:08:46,482][00031] Avg episode rewards: #0: 26.457, true rewards: #0: 11.332
	[2025-03-21 00:08:46,483][00031] Avg episode reward: 26.457, avg true_objective: 11.332
	[2025-03-21 00:08:46,524][00031] Num frames 9100...
	[2025-03-21 00:08:46,636][00031] Num frames 9200...
	[2025-03-21 00:08:46,754][00031] Num frames 9300...
	[2025-03-21 00:08:46,871][00031] Num frames 9400...
	[2025-03-21 00:08:46,987][00031] Num frames 9500...
	[2025-03-21 00:08:47,103][00031] Num frames 9600...
	[2025-03-21 00:08:47,224][00031] Num frames 9700...
	[2025-03-21 00:08:47,287][00031] Avg episode rewards: #0: 25.007, true rewards: #0: 10.784
	[2025-03-21 00:08:47,288][00031] Avg episode reward: 25.007, avg true_objective: 10.784
	[2025-03-21 00:08:47,398][00031] Num frames 9800...
	[2025-03-21 00:08:47,525][00031] Num frames 9900...
	[2025-03-21 00:08:47,649][00031] Num frames 10000...
	[2025-03-21 00:08:47,773][00031] Num frames 10100...
	[2025-03-21 00:08:47,898][00031] Num frames 10200...
	[2025-03-21 00:08:48,022][00031] Num frames 10300...
	[2025-03-21 00:08:48,145][00031] Num frames 10400...
	[2025-03-21 00:08:48,268][00031] Num frames 10500...
	[2025-03-21 00:08:48,388][00031] Num frames 10600...
	[2025-03-21 00:08:48,508][00031] Num frames 10700...
	[2025-03-21 00:08:48,610][00031] Avg episode rewards: #0: 24.938, true rewards: #0: 10.738
	[2025-03-21 00:08:48,611][00031] Avg episode reward: 24.938, avg true_objective: 10.738
	[2025-03-21 00:09:25,266][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!
	[2025-03-21 00:10:17,506][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json
	[2025-03-21 00:10:17,507][00031] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 00:10:17,508][00031] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 00:10:17,509][00031] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 00:10:17,510][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 00:10:17,511][00031] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 00:10:17,512][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-03-21 00:10:17,513][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 00:10:17,514][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-03-21 00:10:17,515][00031] Adding new argument 'hf_repository'='salym/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-03-21 00:10:17,516][00031] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 00:10:17,517][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 00:10:17,517][00031] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 00:10:17,518][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 00:10:17,519][00031] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 00:10:17,542][00031] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 00:10:17,544][00031] RunningMeanStd input shape: (1,)
	[2025-03-21 00:10:17,555][00031] ConvEncoder: input_channels=3
	[2025-03-21 00:10:17,590][00031] Conv encoder output size: 512
	[2025-03-21 00:10:17,591][00031] Policy head output size: 512
	[2025-03-21 00:10:17,610][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth...
	[2025-03-21 00:10:18,061][00031] Num frames 100...
	[2025-03-21 00:10:18,191][00031] Num frames 200...
	[2025-03-21 00:10:18,311][00031] Num frames 300...
	[2025-03-21 00:10:18,435][00031] Num frames 400...
	[2025-03-21 00:10:18,557][00031] Num frames 500...
	[2025-03-21 00:10:18,681][00031] Num frames 600...
	[2025-03-21 00:10:18,803][00031] Num frames 700...
	[2025-03-21 00:10:18,939][00031] Num frames 800...
	[2025-03-21 00:10:19,070][00031] Num frames 900...
	[2025-03-21 00:10:19,200][00031] Num frames 1000...
	[2025-03-21 00:10:19,321][00031] Num frames 1100...
	[2025-03-21 00:10:19,438][00031] Num frames 1200...
	[2025-03-21 00:10:19,553][00031] Num frames 1300...
	[2025-03-21 00:10:19,669][00031] Num frames 1400...
	[2025-03-21 00:10:19,784][00031] Num frames 1500...
	[2025-03-21 00:10:19,879][00031] Avg episode rewards: #0: 32.360, true rewards: #0: 15.360
	[2025-03-21 00:10:19,880][00031] Avg episode reward: 32.360, avg true_objective: 15.360
	[2025-03-21 00:10:19,953][00031] Num frames 1600...
	[2025-03-21 00:10:20,069][00031] Num frames 1700...
	[2025-03-21 00:10:20,188][00031] Num frames 1800...
	[2025-03-21 00:10:20,303][00031] Num frames 1900...
	[2025-03-21 00:10:20,419][00031] Num frames 2000...
	[2025-03-21 00:10:20,539][00031] Num frames 2100...
	[2025-03-21 00:10:20,661][00031] Num frames 2200...
	[2025-03-21 00:10:20,783][00031] Num frames 2300...
	[2025-03-21 00:10:20,907][00031] Num frames 2400...
	[2025-03-21 00:10:21,031][00031] Num frames 2500...
	[2025-03-21 00:10:21,160][00031] Num frames 2600...
	[2025-03-21 00:10:21,283][00031] Avg episode rewards: #0: 30.775, true rewards: #0: 13.275
	[2025-03-21 00:10:21,284][00031] Avg episode reward: 30.775, avg true_objective: 13.275
	[2025-03-21 00:10:21,339][00031] Num frames 2700...
	[2025-03-21 00:10:21,456][00031] Num frames 2800...
	[2025-03-21 00:10:21,571][00031] Num frames 2900...
	[2025-03-21 00:10:21,692][00031] Num frames 3000...
	[2025-03-21 00:10:21,813][00031] Num frames 3100...
	[2025-03-21 00:10:21,931][00031] Num frames 3200...
	[2025-03-21 00:10:22,050][00031] Num frames 3300...
	[2025-03-21 00:10:22,167][00031] Num frames 3400...
	[2025-03-21 00:10:22,283][00031] Num frames 3500...
	[2025-03-21 00:10:22,401][00031] Num frames 3600...
	[2025-03-21 00:10:22,521][00031] Num frames 3700...
	[2025-03-21 00:10:22,638][00031] Num frames 3800...
	[2025-03-21 00:10:22,757][00031] Num frames 3900...
	[2025-03-21 00:10:22,882][00031] Avg episode rewards: #0: 32.197, true rewards: #0: 13.197
	[2025-03-21 00:10:22,882][00031] Avg episode reward: 32.197, avg true_objective: 13.197
	[2025-03-21 00:10:22,930][00031] Num frames 4000...
	[2025-03-21 00:10:23,046][00031] Num frames 4100...
	[2025-03-21 00:10:23,162][00031] Num frames 4200...
	[2025-03-21 00:10:23,280][00031] Num frames 4300...
	[2025-03-21 00:10:23,398][00031] Num frames 4400...
	[2025-03-21 00:10:23,519][00031] Num frames 4500...
	[2025-03-21 00:10:23,635][00031] Num frames 4600...
	[2025-03-21 00:10:23,799][00031] Avg episode rewards: #0: 27.488, true rewards: #0: 11.737
	[2025-03-21 00:10:23,800][00031] Avg episode reward: 27.488, avg true_objective: 11.737
	[2025-03-21 00:10:23,807][00031] Num frames 4700...
	[2025-03-21 00:10:23,928][00031] Num frames 4800...
	[2025-03-21 00:10:24,044][00031] Num frames 4900...
	[2025-03-21 00:10:24,162][00031] Num frames 5000...
	[2025-03-21 00:10:24,278][00031] Num frames 5100...
	[2025-03-21 00:10:24,394][00031] Num frames 5200...
	[2025-03-21 00:10:24,509][00031] Num frames 5300...
	[2025-03-21 00:10:24,625][00031] Num frames 5400...
	[2025-03-21 00:10:24,743][00031] Num frames 5500...
	[2025-03-21 00:10:24,867][00031] Num frames 5600...
	[2025-03-21 00:10:24,970][00031] Avg episode rewards: #0: 26.274, true rewards: #0: 11.274
	[2025-03-21 00:10:24,971][00031] Avg episode reward: 26.274, avg true_objective: 11.274
	[2025-03-21 00:10:25,048][00031] Num frames 5700...
	[2025-03-21 00:10:25,185][00031] Num frames 5800...
	[2025-03-21 00:10:25,305][00031] Num frames 5900...
	[2025-03-21 00:10:25,423][00031] Num frames 6000...
	[2025-03-21 00:10:25,545][00031] Num frames 6100...
	[2025-03-21 00:10:25,671][00031] Num frames 6200...
	[2025-03-21 00:10:25,800][00031] Num frames 6300...
	[2025-03-21 00:10:25,928][00031] Num frames 6400...
	[2025-03-21 00:10:26,052][00031] Num frames 6500...
	[2025-03-21 00:10:26,181][00031] Num frames 6600...
	[2025-03-21 00:10:26,307][00031] Num frames 6700...
	[2025-03-21 00:10:26,433][00031] Num frames 6800...
	[2025-03-21 00:10:26,557][00031] Num frames 6900...
	[2025-03-21 00:10:26,676][00031] Num frames 7000...
	[2025-03-21 00:10:26,800][00031] Num frames 7100...
	[2025-03-21 00:10:26,920][00031] Num frames 7200...
	[2025-03-21 00:10:27,039][00031] Num frames 7300...
	[2025-03-21 00:10:27,160][00031] Num frames 7400...
	[2025-03-21 00:10:27,276][00031] Num frames 7500...
	[2025-03-21 00:10:27,395][00031] Num frames 7600...
	[2025-03-21 00:10:27,512][00031] Num frames 7700...
	[2025-03-21 00:10:27,609][00031] Avg episode rewards: #0: 31.395, true rewards: #0: 12.895
	[2025-03-21 00:10:27,610][00031] Avg episode reward: 31.395, avg true_objective: 12.895
	[2025-03-21 00:10:27,681][00031] Num frames 7800...
	[2025-03-21 00:10:27,797][00031] Num frames 7900...
	[2025-03-21 00:10:27,912][00031] Num frames 8000...
	[2025-03-21 00:10:28,026][00031] Num frames 8100...
	[2025-03-21 00:10:28,106][00031] Avg episode rewards: #0: 27.459, true rewards: #0: 11.601
	[2025-03-21 00:10:28,106][00031] Avg episode reward: 27.459, avg true_objective: 11.601
	[2025-03-21 00:10:28,197][00031] Num frames 8200...
	[2025-03-21 00:10:28,318][00031] Num frames 8300...
	[2025-03-21 00:10:28,438][00031] Num frames 8400...
	[2025-03-21 00:10:28,559][00031] Num frames 8500...
	[2025-03-21 00:10:28,684][00031] Num frames 8600...
	[2025-03-21 00:10:28,803][00031] Num frames 8700...
	[2025-03-21 00:10:28,929][00031] Num frames 8800...
	[2025-03-21 00:10:29,062][00031] Num frames 8900...
	[2025-03-21 00:10:29,194][00031] Num frames 9000...
	[2025-03-21 00:10:29,311][00031] Num frames 9100...
	[2025-03-21 00:10:29,426][00031] Num frames 9200...
	[2025-03-21 00:10:29,545][00031] Num frames 9300...
	[2025-03-21 00:10:29,671][00031] Num frames 9400...
	[2025-03-21 00:10:29,795][00031] Num frames 9500...
	[2025-03-21 00:10:29,914][00031] Num frames 9600...
	[2025-03-21 00:10:30,032][00031] Num frames 9700...
	[2025-03-21 00:10:30,154][00031] Num frames 9800...
	[2025-03-21 00:10:30,282][00031] Num frames 9900...
	[2025-03-21 00:10:30,404][00031] Num frames 10000...
	[2025-03-21 00:10:30,528][00031] Num frames 10100...
	[2025-03-21 00:10:30,662][00031] Num frames 10200...
	[2025-03-21 00:10:30,745][00031] Avg episode rewards: #0: 31.276, true rewards: #0: 12.776
	[2025-03-21 00:10:30,746][00031] Avg episode reward: 31.276, avg true_objective: 12.776
	[2025-03-21 00:10:30,853][00031] Num frames 10300...
	[2025-03-21 00:10:30,982][00031] Num frames 10400...
	[2025-03-21 00:10:31,110][00031] Num frames 10500...
	[2025-03-21 00:10:31,234][00031] Num frames 10600...
	[2025-03-21 00:10:31,359][00031] Num frames 10700...
	[2025-03-21 00:10:31,485][00031] Num frames 10800...
	[2025-03-21 00:10:31,651][00031] Avg episode rewards: #0: 29.325, true rewards: #0: 12.103
	[2025-03-21 00:10:31,652][00031] Avg episode reward: 29.325, avg true_objective: 12.103
	[2025-03-21 00:10:31,661][00031] Num frames 10900...
	[2025-03-21 00:10:31,787][00031] Num frames 11000...
	[2025-03-21 00:10:31,907][00031] Num frames 11100...
	[2025-03-21 00:10:32,030][00031] Num frames 11200...
	[2025-03-21 00:10:32,148][00031] Num frames 11300...
	[2025-03-21 00:10:32,268][00031] Num frames 11400...
	[2025-03-21 00:10:32,386][00031] Num frames 11500...
	[2025-03-21 00:10:32,510][00031] Num frames 11600...
	[2025-03-21 00:10:32,635][00031] Num frames 11700...
	[2025-03-21 00:10:32,757][00031] Num frames 11800...
	[2025-03-21 00:10:32,879][00031] Num frames 11900...
	[2025-03-21 00:10:33,001][00031] Num frames 12000...
	[2025-03-21 00:10:33,129][00031] Num frames 12100...
	[2025-03-21 00:10:33,281][00031] Avg episode rewards: #0: 29.273, true rewards: #0: 12.173
	[2025-03-21 00:10:33,282][00031] Avg episode reward: 29.273, avg true_objective: 12.173
	[2025-03-21 00:11:13,840][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!