mkdem's picture
Upload folder using huggingface_hub
7657c59 verified
[2024-09-18 11:19:19,246][00268] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-18 11:19:19,248][00268] Rollout worker 0 uses device cpu
[2024-09-18 11:19:19,250][00268] Rollout worker 1 uses device cpu
[2024-09-18 11:19:19,252][00268] Rollout worker 2 uses device cpu
[2024-09-18 11:19:19,253][00268] Rollout worker 3 uses device cpu
[2024-09-18 11:19:19,254][00268] Rollout worker 4 uses device cpu
[2024-09-18 11:19:19,259][00268] Rollout worker 5 uses device cpu
[2024-09-18 11:19:19,260][00268] Rollout worker 6 uses device cpu
[2024-09-18 11:19:19,261][00268] Rollout worker 7 uses device cpu
[2024-09-18 11:19:19,404][00268] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:19:19,405][00268] InferenceWorker_p0-w0: min num requests: 2
[2024-09-18 11:19:19,438][00268] Starting all processes...
[2024-09-18 11:19:19,439][00268] Starting process learner_proc0
[2024-09-18 11:19:19,488][00268] Starting all processes...
[2024-09-18 11:19:19,499][00268] Starting process inference_proc0-0
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc0
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc1
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc2
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc3
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc4
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc5
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc6
[2024-09-18 11:19:19,500][00268] Starting process rollout_proc7
[2024-09-18 11:19:30,928][03589] Worker 5 uses CPU cores [1]
[2024-09-18 11:19:31,020][03587] Worker 3 uses CPU cores [1]
[2024-09-18 11:19:31,023][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:19:31,023][03570] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-18 11:19:31,027][03588] Worker 6 uses CPU cores [0]
[2024-09-18 11:19:31,045][03584] Worker 0 uses CPU cores [0]
[2024-09-18 11:19:31,050][03586] Worker 2 uses CPU cores [0]
[2024-09-18 11:19:31,070][03591] Worker 7 uses CPU cores [1]
[2024-09-18 11:19:31,082][03570] Num visible devices: 1
[2024-09-18 11:19:31,116][03570] Starting seed is not provided
[2024-09-18 11:19:31,117][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:19:31,118][03570] Initializing actor-critic model on device cuda:0
[2024-09-18 11:19:31,120][03570] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:19:31,121][03570] RunningMeanStd input shape: (1,)
[2024-09-18 11:19:31,132][03583] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:19:31,132][03583] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-18 11:19:31,133][03590] Worker 4 uses CPU cores [0]
[2024-09-18 11:19:31,150][03583] Num visible devices: 1
[2024-09-18 11:19:31,154][03570] ConvEncoder: input_channels=3
[2024-09-18 11:19:31,183][03585] Worker 1 uses CPU cores [1]
[2024-09-18 11:19:31,328][03570] Conv encoder output size: 512
[2024-09-18 11:19:31,328][03570] Policy head output size: 512
[2024-09-18 11:19:31,344][03570] Created Actor Critic model with architecture:
[2024-09-18 11:19:31,344][03570] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-09-18 11:19:35,560][03570] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-18 11:19:35,561][03570] No checkpoints found
[2024-09-18 11:19:35,561][03570] Did not load from checkpoint, starting from scratch!
[2024-09-18 11:19:35,561][03570] Initialized policy 0 weights for model version 0
[2024-09-18 11:19:35,564][03570] LearnerWorker_p0 finished initialization!
[2024-09-18 11:19:35,566][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:19:35,698][03583] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:19:35,700][03583] RunningMeanStd input shape: (1,)
[2024-09-18 11:19:35,715][03583] ConvEncoder: input_channels=3
[2024-09-18 11:19:35,831][03583] Conv encoder output size: 512
[2024-09-18 11:19:35,831][03583] Policy head output size: 512
[2024-09-18 11:19:35,848][00268] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:19:37,362][00268] Inference worker 0-0 is ready!
[2024-09-18 11:19:37,364][00268] All inference workers are ready! Signal rollout workers to start!
[2024-09-18 11:19:37,488][03584] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,491][03590] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,506][03591] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,511][03589] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,510][03587] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,513][03586] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,522][03588] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:37,527][03585] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:19:38,851][03590] Decorrelating experience for 0 frames...
[2024-09-18 11:19:38,853][03586] Decorrelating experience for 0 frames...
[2024-09-18 11:19:38,851][03585] Decorrelating experience for 0 frames...
[2024-09-18 11:19:38,853][03587] Decorrelating experience for 0 frames...
[2024-09-18 11:19:38,854][03589] Decorrelating experience for 0 frames...
[2024-09-18 11:19:39,396][00268] Heartbeat connected on Batcher_0
[2024-09-18 11:19:39,403][00268] Heartbeat connected on LearnerWorker_p0
[2024-09-18 11:19:39,432][00268] Heartbeat connected on InferenceWorker_p0-w0
[2024-09-18 11:19:39,610][03586] Decorrelating experience for 32 frames...
[2024-09-18 11:19:39,615][03590] Decorrelating experience for 32 frames...
[2024-09-18 11:19:39,628][03589] Decorrelating experience for 32 frames...
[2024-09-18 11:19:39,630][03587] Decorrelating experience for 32 frames...
[2024-09-18 11:19:40,574][03586] Decorrelating experience for 64 frames...
[2024-09-18 11:19:40,590][03590] Decorrelating experience for 64 frames...
[2024-09-18 11:19:40,615][03585] Decorrelating experience for 32 frames...
[2024-09-18 11:19:40,731][03589] Decorrelating experience for 64 frames...
[2024-09-18 11:19:40,847][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:19:41,437][03590] Decorrelating experience for 96 frames...
[2024-09-18 11:19:41,564][00268] Heartbeat connected on RolloutWorker_w4
[2024-09-18 11:19:41,567][03586] Decorrelating experience for 96 frames...
[2024-09-18 11:19:41,693][00268] Heartbeat connected on RolloutWorker_w2
[2024-09-18 11:19:41,778][03587] Decorrelating experience for 64 frames...
[2024-09-18 11:19:41,880][03585] Decorrelating experience for 64 frames...
[2024-09-18 11:19:41,926][03589] Decorrelating experience for 96 frames...
[2024-09-18 11:19:42,039][00268] Heartbeat connected on RolloutWorker_w5
[2024-09-18 11:19:42,275][03585] Decorrelating experience for 96 frames...
[2024-09-18 11:19:42,335][00268] Heartbeat connected on RolloutWorker_w1
[2024-09-18 11:19:42,585][03587] Decorrelating experience for 96 frames...
[2024-09-18 11:19:42,645][00268] Heartbeat connected on RolloutWorker_w3
[2024-09-18 11:19:45,848][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:19:45,850][00268] Avg episode reward: [(0, '2.020')]
[2024-09-18 11:19:48,873][03570] Signal inference workers to stop experience collection...
[2024-09-18 11:19:48,882][03583] InferenceWorker_p0-w0: stopping experience collection
[2024-09-18 11:19:50,018][03570] Signal inference workers to resume experience collection...
[2024-09-18 11:19:50,020][03583] InferenceWorker_p0-w0: resuming experience collection
[2024-09-18 11:19:50,848][00268] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 165.3. Samples: 2480. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-09-18 11:19:50,850][00268] Avg episode reward: [(0, '3.103')]
[2024-09-18 11:19:55,848][00268] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 359.4. Samples: 7188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:19:55,849][00268] Avg episode reward: [(0, '4.019')]
[2024-09-18 11:19:59,169][03583] Updated weights for policy 0, policy_version 10 (0.0362)
[2024-09-18 11:20:00,849][00268] Fps is (10 sec: 4095.3, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 396.1. Samples: 9904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:20:00,852][00268] Avg episode reward: [(0, '4.359')]
[2024-09-18 11:20:05,848][00268] Fps is (10 sec: 2867.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 479.5. Samples: 14384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:20:05,855][00268] Avg episode reward: [(0, '4.327')]
[2024-09-18 11:20:10,848][00268] Fps is (10 sec: 3277.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 579.5. Samples: 20284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:20:10,860][00268] Avg episode reward: [(0, '4.429')]
[2024-09-18 11:20:10,933][03583] Updated weights for policy 0, policy_version 20 (0.0018)
[2024-09-18 11:20:15,848][00268] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 585.0. Samples: 23400. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:20:15,852][00268] Avg episode reward: [(0, '4.569')]
[2024-09-18 11:20:20,849][00268] Fps is (10 sec: 3685.8, 60 sec: 2548.5, 300 sec: 2548.5). Total num frames: 114688. Throughput: 0: 625.6. Samples: 28152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:20:20,851][00268] Avg episode reward: [(0, '4.446')]
[2024-09-18 11:20:20,858][03570] Saving new best policy, reward=4.446!
[2024-09-18 11:20:22,873][03583] Updated weights for policy 0, policy_version 30 (0.0015)
[2024-09-18 11:20:25,850][00268] Fps is (10 sec: 3685.4, 60 sec: 2703.2, 300 sec: 2703.2). Total num frames: 135168. Throughput: 0: 746.9. Samples: 33612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:20:25,855][00268] Avg episode reward: [(0, '4.500')]
[2024-09-18 11:20:25,863][03570] Saving new best policy, reward=4.500!
[2024-09-18 11:20:30,848][00268] Fps is (10 sec: 3687.0, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 813.5. Samples: 36632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:20:30,856][00268] Avg episode reward: [(0, '4.542')]
[2024-09-18 11:20:30,859][03570] Saving new best policy, reward=4.542!
[2024-09-18 11:20:33,994][03583] Updated weights for policy 0, policy_version 40 (0.0014)
[2024-09-18 11:20:35,848][00268] Fps is (10 sec: 3277.7, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 865.9. Samples: 41444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:20:35,854][00268] Avg episode reward: [(0, '4.331')]
[2024-09-18 11:20:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 877.5. Samples: 46674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:20:40,857][00268] Avg episode reward: [(0, '4.349')]
[2024-09-18 11:20:45,029][03583] Updated weights for policy 0, policy_version 50 (0.0014)
[2024-09-18 11:20:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 886.3. Samples: 49786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:20:45,853][00268] Avg episode reward: [(0, '4.333')]
[2024-09-18 11:20:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 902.4. Samples: 54994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:20:50,858][00268] Avg episode reward: [(0, '4.517')]
[2024-09-18 11:20:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2969.6). Total num frames: 237568. Throughput: 0: 883.3. Samples: 60032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:20:55,850][00268] Avg episode reward: [(0, '4.437')]
[2024-09-18 11:20:57,080][03583] Updated weights for policy 0, policy_version 60 (0.0017)
[2024-09-18 11:21:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 883.9. Samples: 63174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:00,855][00268] Avg episode reward: [(0, '4.298')]
[2024-09-18 11:21:05,851][00268] Fps is (10 sec: 3685.0, 60 sec: 3617.9, 300 sec: 3049.1). Total num frames: 274432. Throughput: 0: 897.5. Samples: 68540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:21:05,857][00268] Avg episode reward: [(0, '4.370')]
[2024-09-18 11:21:09,049][03583] Updated weights for policy 0, policy_version 70 (0.0014)
[2024-09-18 11:21:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3061.2). Total num frames: 290816. Throughput: 0: 882.0. Samples: 73298. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-18 11:21:10,856][00268] Avg episode reward: [(0, '4.459')]
[2024-09-18 11:21:15,848][00268] Fps is (10 sec: 3687.8, 60 sec: 3549.9, 300 sec: 3113.0). Total num frames: 311296. Throughput: 0: 884.7. Samples: 76442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:15,851][00268] Avg episode reward: [(0, '4.622')]
[2024-09-18 11:21:15,872][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth...
[2024-09-18 11:21:15,997][03570] Saving new best policy, reward=4.622!
[2024-09-18 11:21:19,783][03583] Updated weights for policy 0, policy_version 80 (0.0014)
[2024-09-18 11:21:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3120.8). Total num frames: 327680. Throughput: 0: 900.2. Samples: 81952. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:20,850][00268] Avg episode reward: [(0, '4.606')]
[2024-09-18 11:21:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3127.9). Total num frames: 344064. Throughput: 0: 884.8. Samples: 86492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:21:25,850][00268] Avg episode reward: [(0, '4.602')]
[2024-09-18 11:21:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 884.0. Samples: 89568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:21:30,850][00268] Avg episode reward: [(0, '4.565')]
[2024-09-18 11:21:31,357][03583] Updated weights for policy 0, policy_version 90 (0.0016)
[2024-09-18 11:21:35,850][00268] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3174.3). Total num frames: 380928. Throughput: 0: 896.2. Samples: 95324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:35,852][00268] Avg episode reward: [(0, '4.454')]
[2024-09-18 11:21:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 875.7. Samples: 99440. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:40,850][00268] Avg episode reward: [(0, '4.530')]
[2024-09-18 11:21:43,611][03583] Updated weights for policy 0, policy_version 100 (0.0018)
[2024-09-18 11:21:45,848][00268] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 872.4. Samples: 102432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:21:45,854][00268] Avg episode reward: [(0, '4.457')]
[2024-09-18 11:21:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 891.9. Samples: 108674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:21:50,851][00268] Avg episode reward: [(0, '4.456')]
[2024-09-18 11:21:55,771][03583] Updated weights for policy 0, policy_version 110 (0.0024)
[2024-09-18 11:21:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3218.3). Total num frames: 450560. Throughput: 0: 873.3. Samples: 112598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:21:55,855][00268] Avg episode reward: [(0, '4.704')]
[2024-09-18 11:21:55,868][03570] Saving new best policy, reward=4.704!
[2024-09-18 11:22:00,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3220.3). Total num frames: 466944. Throughput: 0: 869.3. Samples: 115560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:00,854][00268] Avg episode reward: [(0, '4.757')]
[2024-09-18 11:22:00,947][03570] Saving new best policy, reward=4.757!
[2024-09-18 11:22:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 879.7. Samples: 121538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:22:05,851][00268] Avg episode reward: [(0, '4.508')]
[2024-09-18 11:22:06,489][03583] Updated weights for policy 0, policy_version 120 (0.0017)
[2024-09-18 11:22:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3223.9). Total num frames: 499712. Throughput: 0: 876.0. Samples: 125910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:22:10,857][00268] Avg episode reward: [(0, '4.328')]
[2024-09-18 11:22:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 866.6. Samples: 128566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:15,850][00268] Avg episode reward: [(0, '4.399')]
[2024-09-18 11:22:18,208][03583] Updated weights for policy 0, policy_version 130 (0.0019)
[2024-09-18 11:22:20,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 540672. Throughput: 0: 875.0. Samples: 134696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:20,855][00268] Avg episode reward: [(0, '4.406')]
[2024-09-18 11:22:25,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 557056. Throughput: 0: 887.0. Samples: 139356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:25,853][00268] Avg episode reward: [(0, '4.523')]
[2024-09-18 11:22:30,379][03583] Updated weights for policy 0, policy_version 140 (0.0021)
[2024-09-18 11:22:30,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 573440. Throughput: 0: 873.2. Samples: 141726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:22:30,850][00268] Avg episode reward: [(0, '4.425')]
[2024-09-18 11:22:35,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3550.0, 300 sec: 3299.6). Total num frames: 593920. Throughput: 0: 868.0. Samples: 147734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:22:35,850][00268] Avg episode reward: [(0, '4.410')]
[2024-09-18 11:22:40,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3298.9). Total num frames: 610304. Throughput: 0: 890.3. Samples: 152660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:40,855][00268] Avg episode reward: [(0, '4.373')]
[2024-09-18 11:22:42,334][03583] Updated weights for policy 0, policy_version 150 (0.0012)
[2024-09-18 11:22:45,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3298.4). Total num frames: 626688. Throughput: 0: 872.3. Samples: 154812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:22:45,850][00268] Avg episode reward: [(0, '4.299')]
[2024-09-18 11:22:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3318.8). Total num frames: 647168. Throughput: 0: 874.9. Samples: 160910. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:22:50,852][00268] Avg episode reward: [(0, '4.312')]
[2024-09-18 11:22:52,666][03583] Updated weights for policy 0, policy_version 160 (0.0012)
[2024-09-18 11:22:55,849][00268] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3317.7). Total num frames: 663552. Throughput: 0: 894.5. Samples: 166162. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:22:55,854][00268] Avg episode reward: [(0, '4.478')]
[2024-09-18 11:23:00,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3316.8). Total num frames: 679936. Throughput: 0: 878.4. Samples: 168094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:00,851][00268] Avg episode reward: [(0, '4.674')]
[2024-09-18 11:23:04,871][03583] Updated weights for policy 0, policy_version 170 (0.0021)
[2024-09-18 11:23:05,850][00268] Fps is (10 sec: 3686.1, 60 sec: 3549.7, 300 sec: 3335.3). Total num frames: 700416. Throughput: 0: 874.2. Samples: 174038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:23:05,853][00268] Avg episode reward: [(0, '4.713')]
[2024-09-18 11:23:10,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3334.0). Total num frames: 716800. Throughput: 0: 891.6. Samples: 179478. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:10,850][00268] Avg episode reward: [(0, '4.564')]
[2024-09-18 11:23:15,848][00268] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3332.7). Total num frames: 733184. Throughput: 0: 883.7. Samples: 181492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:15,850][00268] Avg episode reward: [(0, '4.597')]
[2024-09-18 11:23:15,859][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth...
[2024-09-18 11:23:16,864][03583] Updated weights for policy 0, policy_version 180 (0.0014)
[2024-09-18 11:23:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3349.6). Total num frames: 753664. Throughput: 0: 880.8. Samples: 187368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:20,853][00268] Avg episode reward: [(0, '4.606')]
[2024-09-18 11:23:25,849][00268] Fps is (10 sec: 3685.9, 60 sec: 3549.8, 300 sec: 3348.0). Total num frames: 770048. Throughput: 0: 898.6. Samples: 193098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:25,853][00268] Avg episode reward: [(0, '4.436')]
[2024-09-18 11:23:28,042][03583] Updated weights for policy 0, policy_version 190 (0.0016)
[2024-09-18 11:23:30,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3329.1). Total num frames: 782336. Throughput: 0: 895.6. Samples: 195112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:23:30,852][00268] Avg episode reward: [(0, '4.604')]
[2024-09-18 11:23:35,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3362.1). Total num frames: 806912. Throughput: 0: 880.8. Samples: 200546. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-18 11:23:35,850][00268] Avg episode reward: [(0, '4.580')]
[2024-09-18 11:23:38,775][03583] Updated weights for policy 0, policy_version 200 (0.0018)
[2024-09-18 11:23:40,850][00268] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3360.4). Total num frames: 823296. Throughput: 0: 899.0. Samples: 206620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:23:40,860][00268] Avg episode reward: [(0, '4.477')]
[2024-09-18 11:23:45,850][00268] Fps is (10 sec: 2866.4, 60 sec: 3481.5, 300 sec: 3342.3). Total num frames: 835584. Throughput: 0: 900.3. Samples: 208610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:23:45,857][00268] Avg episode reward: [(0, '4.556')]
[2024-09-18 11:23:50,848][00268] Fps is (10 sec: 3277.7, 60 sec: 3481.6, 300 sec: 3357.1). Total num frames: 856064. Throughput: 0: 884.4. Samples: 213832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:23:50,854][00268] Avg episode reward: [(0, '4.837')]
[2024-09-18 11:23:50,859][03570] Saving new best policy, reward=4.837!
[2024-09-18 11:23:51,116][03583] Updated weights for policy 0, policy_version 210 (0.0014)
[2024-09-18 11:23:55,848][00268] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3371.3). Total num frames: 876544. Throughput: 0: 897.6. Samples: 219872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:23:55,854][00268] Avg episode reward: [(0, '5.091')]
[2024-09-18 11:23:55,865][03570] Saving new best policy, reward=5.091!
[2024-09-18 11:24:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3369.5). Total num frames: 892928. Throughput: 0: 896.1. Samples: 221818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:24:00,854][00268] Avg episode reward: [(0, '5.288')]
[2024-09-18 11:24:00,857][03570] Saving new best policy, reward=5.288!
[2024-09-18 11:24:03,292][03583] Updated weights for policy 0, policy_version 220 (0.0015)
[2024-09-18 11:24:05,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3367.8). Total num frames: 909312. Throughput: 0: 874.4. Samples: 226714. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:05,856][00268] Avg episode reward: [(0, '5.011')]
[2024-09-18 11:24:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3381.1). Total num frames: 929792. Throughput: 0: 883.6. Samples: 232860. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-09-18 11:24:10,850][00268] Avg episode reward: [(0, '4.666')]
[2024-09-18 11:24:14,497][03583] Updated weights for policy 0, policy_version 230 (0.0013)
[2024-09-18 11:24:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3364.6). Total num frames: 942080. Throughput: 0: 891.6. Samples: 235232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:15,854][00268] Avg episode reward: [(0, '4.724')]
[2024-09-18 11:24:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3377.4). Total num frames: 962560. Throughput: 0: 876.4. Samples: 239984. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:24:20,852][00268] Avg episode reward: [(0, '4.824')]
[2024-09-18 11:24:25,284][03583] Updated weights for policy 0, policy_version 240 (0.0012)
[2024-09-18 11:24:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3389.8). Total num frames: 983040. Throughput: 0: 878.1. Samples: 246134. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:25,854][00268] Avg episode reward: [(0, '4.984')]
[2024-09-18 11:24:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3387.9). Total num frames: 999424. Throughput: 0: 890.8. Samples: 248694. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:30,854][00268] Avg episode reward: [(0, '5.253')]
[2024-09-18 11:24:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1015808. Throughput: 0: 875.2. Samples: 253218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:24:35,855][00268] Avg episode reward: [(0, '5.111')]
[2024-09-18 11:24:37,394][03583] Updated weights for policy 0, policy_version 250 (0.0012)
[2024-09-18 11:24:40,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 879.7. Samples: 259458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:24:40,850][00268] Avg episode reward: [(0, '5.163')]
[2024-09-18 11:24:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 1052672. Throughput: 0: 902.5. Samples: 262432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:45,852][00268] Avg episode reward: [(0, '5.169')]
[2024-09-18 11:24:49,232][03583] Updated weights for policy 0, policy_version 260 (0.0014)
[2024-09-18 11:24:50,849][00268] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 1069056. Throughput: 0: 890.7. Samples: 266796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:50,857][00268] Avg episode reward: [(0, '5.309')]
[2024-09-18 11:24:50,861][03570] Saving new best policy, reward=5.309!
[2024-09-18 11:24:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1089536. Throughput: 0: 890.2. Samples: 272918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:24:55,850][00268] Avg episode reward: [(0, '4.976')]
[2024-09-18 11:24:59,387][03583] Updated weights for policy 0, policy_version 270 (0.0016)
[2024-09-18 11:25:00,848][00268] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1105920. Throughput: 0: 906.7. Samples: 276034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:00,855][00268] Avg episode reward: [(0, '5.349')]
[2024-09-18 11:25:00,858][03570] Saving new best policy, reward=5.349!
[2024-09-18 11:25:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1122304. Throughput: 0: 889.4. Samples: 280008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:05,849][00268] Avg episode reward: [(0, '5.447')]
[2024-09-18 11:25:05,863][03570] Saving new best policy, reward=5.447!
[2024-09-18 11:25:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1142784. Throughput: 0: 890.8. Samples: 286222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:10,850][00268] Avg episode reward: [(0, '5.860')]
[2024-09-18 11:25:10,856][03570] Saving new best policy, reward=5.860!
[2024-09-18 11:25:11,208][03583] Updated weights for policy 0, policy_version 280 (0.0021)
[2024-09-18 11:25:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1159168. Throughput: 0: 901.0. Samples: 289240. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:15,851][00268] Avg episode reward: [(0, '5.828')]
[2024-09-18 11:25:15,891][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth...
[2024-09-18 11:25:16,046][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth
[2024-09-18 11:25:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1175552. Throughput: 0: 891.6. Samples: 293340. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:20,851][00268] Avg episode reward: [(0, '5.563')]
[2024-09-18 11:25:23,239][03583] Updated weights for policy 0, policy_version 290 (0.0017)
[2024-09-18 11:25:25,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1196032. Throughput: 0: 888.6. Samples: 299446. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:25:25,850][00268] Avg episode reward: [(0, '5.255')]
[2024-09-18 11:25:30,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1216512. Throughput: 0: 891.7. Samples: 302560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:30,856][00268] Avg episode reward: [(0, '5.591')]
[2024-09-18 11:25:34,894][03583] Updated weights for policy 0, policy_version 300 (0.0018)
[2024-09-18 11:25:35,850][00268] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 1228800. Throughput: 0: 894.3. Samples: 307042. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:25:35,853][00268] Avg episode reward: [(0, '5.911')]
[2024-09-18 11:25:35,868][03570] Saving new best policy, reward=5.911!
[2024-09-18 11:25:40,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1249280. Throughput: 0: 885.8. Samples: 312778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:40,851][00268] Avg episode reward: [(0, '6.271')]
[2024-09-18 11:25:40,868][03570] Saving new best policy, reward=6.271!
[2024-09-18 11:25:45,264][03583] Updated weights for policy 0, policy_version 310 (0.0012)
[2024-09-18 11:25:45,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1269760. Throughput: 0: 884.1. Samples: 315820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:25:45,851][00268] Avg episode reward: [(0, '6.389')]
[2024-09-18 11:25:45,869][03570] Saving new best policy, reward=6.389!
[2024-09-18 11:25:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 1282048. Throughput: 0: 902.2. Samples: 320606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:50,850][00268] Avg episode reward: [(0, '6.987')]
[2024-09-18 11:25:50,859][03570] Saving new best policy, reward=6.987!
[2024-09-18 11:25:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1302528. Throughput: 0: 884.4. Samples: 326018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:25:55,852][00268] Avg episode reward: [(0, '7.429')]
[2024-09-18 11:25:55,862][03570] Saving new best policy, reward=7.429!
[2024-09-18 11:25:57,393][03583] Updated weights for policy 0, policy_version 320 (0.0013)
[2024-09-18 11:26:00,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1323008. Throughput: 0: 884.9. Samples: 329060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:00,856][00268] Avg episode reward: [(0, '7.123')]
[2024-09-18 11:26:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1335296. Throughput: 0: 907.6. Samples: 334182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:05,852][00268] Avg episode reward: [(0, '7.216')]
[2024-09-18 11:26:09,269][03583] Updated weights for policy 0, policy_version 330 (0.0013)
[2024-09-18 11:26:10,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1355776. Throughput: 0: 888.6. Samples: 339432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:10,850][00268] Avg episode reward: [(0, '7.243')]
[2024-09-18 11:26:15,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1376256. Throughput: 0: 888.2. Samples: 342528. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:15,851][00268] Avg episode reward: [(0, '8.040')]
[2024-09-18 11:26:15,863][03570] Saving new best policy, reward=8.040!
[2024-09-18 11:26:20,444][03583] Updated weights for policy 0, policy_version 340 (0.0013)
[2024-09-18 11:26:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1392640. Throughput: 0: 905.2. Samples: 347774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:26:20,850][00268] Avg episode reward: [(0, '8.616')]
[2024-09-18 11:26:20,854][03570] Saving new best policy, reward=8.616!
[2024-09-18 11:26:25,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1409024. Throughput: 0: 886.6. Samples: 352674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:25,855][00268] Avg episode reward: [(0, '9.240')]
[2024-09-18 11:26:25,876][03570] Saving new best policy, reward=9.240!
[2024-09-18 11:26:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1429504. Throughput: 0: 885.4. Samples: 355662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:26:30,849][00268] Avg episode reward: [(0, '8.832')]
[2024-09-18 11:26:31,283][03583] Updated weights for policy 0, policy_version 350 (0.0016)
[2024-09-18 11:26:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 1445888. Throughput: 0: 900.5. Samples: 361128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:35,850][00268] Avg episode reward: [(0, '7.798')]
[2024-09-18 11:26:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1462272. Throughput: 0: 887.9. Samples: 365974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:26:40,850][00268] Avg episode reward: [(0, '7.667')]
[2024-09-18 11:26:43,242][03583] Updated weights for policy 0, policy_version 360 (0.0025)
[2024-09-18 11:26:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1482752. Throughput: 0: 891.3. Samples: 369168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:26:45,858][00268] Avg episode reward: [(0, '8.492')]
[2024-09-18 11:26:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1499136. Throughput: 0: 905.6. Samples: 374932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:26:50,853][00268] Avg episode reward: [(0, '8.853')]
[2024-09-18 11:26:55,177][03583] Updated weights for policy 0, policy_version 370 (0.0013)
[2024-09-18 11:26:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1515520. Throughput: 0: 889.4. Samples: 379454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:26:55,853][00268] Avg episode reward: [(0, '9.710')]
[2024-09-18 11:26:55,862][03570] Saving new best policy, reward=9.710!
[2024-09-18 11:27:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1536000. Throughput: 0: 886.7. Samples: 382428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:27:00,854][00268] Avg episode reward: [(0, '10.164')]
[2024-09-18 11:27:00,862][03570] Saving new best policy, reward=10.164!
[2024-09-18 11:27:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1552384. Throughput: 0: 898.0. Samples: 388182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:27:05,857][00268] Avg episode reward: [(0, '10.795')]
[2024-09-18 11:27:05,872][03570] Saving new best policy, reward=10.795!
[2024-09-18 11:27:06,323][03583] Updated weights for policy 0, policy_version 380 (0.0015)
[2024-09-18 11:27:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1568768. Throughput: 0: 881.2. Samples: 392330. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:27:10,854][00268] Avg episode reward: [(0, '10.806')]
[2024-09-18 11:27:10,857][03570] Saving new best policy, reward=10.806!
[2024-09-18 11:27:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1589248. Throughput: 0: 882.7. Samples: 395384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:27:15,854][00268] Avg episode reward: [(0, '10.794')]
[2024-09-18 11:27:15,863][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth...
[2024-09-18 11:27:15,976][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth
[2024-09-18 11:27:17,520][03583] Updated weights for policy 0, policy_version 390 (0.0013)
[2024-09-18 11:27:20,850][00268] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 1605632. Throughput: 0: 897.5. Samples: 401518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:27:20,853][00268] Avg episode reward: [(0, '10.299')]
[2024-09-18 11:27:25,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1622016. Throughput: 0: 879.3. Samples: 405544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:27:25,854][00268] Avg episode reward: [(0, '9.965')]
[2024-09-18 11:27:29,529][03583] Updated weights for policy 0, policy_version 400 (0.0017)
[2024-09-18 11:27:30,847][00268] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1642496. Throughput: 0: 877.2. Samples: 408642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:27:30,850][00268] Avg episode reward: [(0, '9.577')]
[2024-09-18 11:27:35,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 1658880. Throughput: 0: 884.9. Samples: 414752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:27:35,855][00268] Avg episode reward: [(0, '9.911')]
[2024-09-18 11:27:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1675264. Throughput: 0: 876.9. Samples: 418916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:27:40,850][00268] Avg episode reward: [(0, '10.083')]
[2024-09-18 11:27:41,905][03583] Updated weights for policy 0, policy_version 410 (0.0017)
[2024-09-18 11:27:45,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1695744. Throughput: 0: 875.8. Samples: 421838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:27:45,851][00268] Avg episode reward: [(0, '10.960')]
[2024-09-18 11:27:45,860][03570] Saving new best policy, reward=10.960!
[2024-09-18 11:27:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1716224. Throughput: 0: 886.2. Samples: 428062. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:27:50,853][00268] Avg episode reward: [(0, '11.777')]
[2024-09-18 11:27:50,858][03570] Saving new best policy, reward=11.777!
[2024-09-18 11:27:52,255][03583] Updated weights for policy 0, policy_version 420 (0.0014)
[2024-09-18 11:27:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1728512. Throughput: 0: 891.2. Samples: 432434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:27:55,857][00268] Avg episode reward: [(0, '11.729')]
[2024-09-18 11:28:00,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1744896. Throughput: 0: 881.7. Samples: 435062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:28:00,850][00268] Avg episode reward: [(0, '12.197')]
[2024-09-18 11:28:00,853][03570] Saving new best policy, reward=12.197!
[2024-09-18 11:28:04,037][03583] Updated weights for policy 0, policy_version 430 (0.0024)
[2024-09-18 11:28:05,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1765376. Throughput: 0: 878.9. Samples: 441066. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:05,849][00268] Avg episode reward: [(0, '12.195')]
[2024-09-18 11:28:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1781760. Throughput: 0: 897.3. Samples: 445922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:10,853][00268] Avg episode reward: [(0, '12.103')]
[2024-09-18 11:28:15,849][00268] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3540.6). Total num frames: 1798144. Throughput: 0: 881.8. Samples: 448322. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:15,851][00268] Avg episode reward: [(0, '13.471')]
[2024-09-18 11:28:15,929][03570] Saving new best policy, reward=13.471!
[2024-09-18 11:28:15,930][03583] Updated weights for policy 0, policy_version 440 (0.0017)
[2024-09-18 11:28:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 1818624. Throughput: 0: 882.8. Samples: 454478. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:20,855][00268] Avg episode reward: [(0, '14.444')]
[2024-09-18 11:28:20,879][03570] Saving new best policy, reward=14.444!
[2024-09-18 11:28:25,848][00268] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1835008. Throughput: 0: 901.2. Samples: 459468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:28:25,852][00268] Avg episode reward: [(0, '14.270')]
[2024-09-18 11:28:28,064][03583] Updated weights for policy 0, policy_version 450 (0.0017)
[2024-09-18 11:28:30,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1851392. Throughput: 0: 882.2. Samples: 461536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:30,853][00268] Avg episode reward: [(0, '14.892')]
[2024-09-18 11:28:30,856][03570] Saving new best policy, reward=14.892!
[2024-09-18 11:28:35,847][00268] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1871872. Throughput: 0: 878.8. Samples: 467610. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:35,850][00268] Avg episode reward: [(0, '13.119')]
[2024-09-18 11:28:37,997][03583] Updated weights for policy 0, policy_version 460 (0.0018)
[2024-09-18 11:28:40,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1888256. Throughput: 0: 899.7. Samples: 472920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:40,853][00268] Avg episode reward: [(0, '12.122')]
[2024-09-18 11:28:45,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1904640. Throughput: 0: 884.1. Samples: 474848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:28:45,850][00268] Avg episode reward: [(0, '12.922')]
[2024-09-18 11:28:49,840][03583] Updated weights for policy 0, policy_version 470 (0.0015)
[2024-09-18 11:28:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1925120. Throughput: 0: 889.7. Samples: 481104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:50,851][00268] Avg episode reward: [(0, '12.592')]
[2024-09-18 11:28:55,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1945600. Throughput: 0: 903.8. Samples: 486592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:28:55,853][00268] Avg episode reward: [(0, '13.595')]
[2024-09-18 11:29:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1961984. Throughput: 0: 895.8. Samples: 488632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:00,854][00268] Avg episode reward: [(0, '14.362')]
[2024-09-18 11:29:01,721][03583] Updated weights for policy 0, policy_version 480 (0.0021)
[2024-09-18 11:29:05,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1982464. Throughput: 0: 888.2. Samples: 494448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:29:05,850][00268] Avg episode reward: [(0, '14.819')]
[2024-09-18 11:29:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1998848. Throughput: 0: 904.7. Samples: 500180. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:10,852][00268] Avg episode reward: [(0, '15.540')]
[2024-09-18 11:29:10,857][03570] Saving new best policy, reward=15.540!
[2024-09-18 11:29:13,166][03583] Updated weights for policy 0, policy_version 490 (0.0017)
[2024-09-18 11:29:15,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2011136. Throughput: 0: 901.7. Samples: 502112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:29:15,850][00268] Avg episode reward: [(0, '16.066')]
[2024-09-18 11:29:15,859][00268] Components not started: RolloutWorker_w0, RolloutWorker_w6, RolloutWorker_w7, wait_time=600.0 seconds
[2024-09-18 11:29:15,911][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth...
[2024-09-18 11:29:16,010][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth
[2024-09-18 11:29:16,024][03570] Saving new best policy, reward=16.066!
[2024-09-18 11:29:20,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2031616. Throughput: 0: 891.4. Samples: 507724. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:20,850][00268] Avg episode reward: [(0, '16.859')]
[2024-09-18 11:29:20,930][03570] Saving new best policy, reward=16.859!
[2024-09-18 11:29:23,907][03583] Updated weights for policy 0, policy_version 500 (0.0018)
[2024-09-18 11:29:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2052096. Throughput: 0: 903.8. Samples: 513592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:25,855][00268] Avg episode reward: [(0, '15.709')]
[2024-09-18 11:29:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2064384. Throughput: 0: 905.2. Samples: 515584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:29:30,849][00268] Avg episode reward: [(0, '15.991')]
[2024-09-18 11:29:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2084864. Throughput: 0: 886.4. Samples: 520990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:29:35,856][00268] Avg episode reward: [(0, '16.446')]
[2024-09-18 11:29:35,893][03583] Updated weights for policy 0, policy_version 510 (0.0015)
[2024-09-18 11:29:40,847][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2105344. Throughput: 0: 901.0. Samples: 527138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:29:40,855][00268] Avg episode reward: [(0, '16.243')]
[2024-09-18 11:29:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2117632. Throughput: 0: 899.7. Samples: 529118. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:45,850][00268] Avg episode reward: [(0, '16.109')]
[2024-09-18 11:29:47,893][03583] Updated weights for policy 0, policy_version 520 (0.0013)
[2024-09-18 11:29:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2138112. Throughput: 0: 887.4. Samples: 534380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:29:50,850][00268] Avg episode reward: [(0, '16.686')]
[2024-09-18 11:29:55,849][00268] Fps is (10 sec: 4504.9, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 2162688. Throughput: 0: 899.2. Samples: 540646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:29:55,852][00268] Avg episode reward: [(0, '17.365')]
[2024-09-18 11:29:55,862][03570] Saving new best policy, reward=17.365!
[2024-09-18 11:29:58,641][03583] Updated weights for policy 0, policy_version 530 (0.0014)
[2024-09-18 11:30:00,850][00268] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2174976. Throughput: 0: 903.5. Samples: 542774. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:30:00,855][00268] Avg episode reward: [(0, '17.931')]
[2024-09-18 11:30:00,861][03570] Saving new best policy, reward=17.931!
[2024-09-18 11:30:05,848][00268] Fps is (10 sec: 2867.6, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2191360. Throughput: 0: 887.5. Samples: 547662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:30:05,850][00268] Avg episode reward: [(0, '17.475')]
[2024-09-18 11:30:09,802][03583] Updated weights for policy 0, policy_version 540 (0.0013)
[2024-09-18 11:30:10,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2215936. Throughput: 0: 896.0. Samples: 553912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:10,849][00268] Avg episode reward: [(0, '17.394')]
[2024-09-18 11:30:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2228224. Throughput: 0: 904.4. Samples: 556284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:15,854][00268] Avg episode reward: [(0, '16.805')]
[2024-09-18 11:30:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2248704. Throughput: 0: 891.7. Samples: 561116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:30:20,851][00268] Avg episode reward: [(0, '15.713')]
[2024-09-18 11:30:21,740][03583] Updated weights for policy 0, policy_version 550 (0.0022)
[2024-09-18 11:30:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2269184. Throughput: 0: 895.4. Samples: 567432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:30:25,854][00268] Avg episode reward: [(0, '15.488')]
[2024-09-18 11:30:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2281472. Throughput: 0: 909.5. Samples: 570046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:30:30,854][00268] Avg episode reward: [(0, '16.843')]
[2024-09-18 11:30:33,611][03583] Updated weights for policy 0, policy_version 560 (0.0012)
[2024-09-18 11:30:35,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2301952. Throughput: 0: 892.7. Samples: 574550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:35,855][00268] Avg episode reward: [(0, '17.624')]
[2024-09-18 11:30:40,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2322432. Throughput: 0: 892.0. Samples: 580784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:40,851][00268] Avg episode reward: [(0, '17.272')]
[2024-09-18 11:30:43,867][03583] Updated weights for policy 0, policy_version 570 (0.0021)
[2024-09-18 11:30:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2338816. Throughput: 0: 910.4. Samples: 583740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:45,853][00268] Avg episode reward: [(0, '17.620')]
[2024-09-18 11:30:50,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2355200. Throughput: 0: 898.4. Samples: 588090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:30:50,850][00268] Avg episode reward: [(0, '18.075')]
[2024-09-18 11:30:50,854][03570] Saving new best policy, reward=18.075!
[2024-09-18 11:30:55,663][03583] Updated weights for policy 0, policy_version 580 (0.0024)
[2024-09-18 11:30:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3568.4). Total num frames: 2375680. Throughput: 0: 892.8. Samples: 594090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:30:55,852][00268] Avg episode reward: [(0, '17.052')]
[2024-09-18 11:31:00,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 2392064. Throughput: 0: 907.0. Samples: 597100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:31:00,851][00268] Avg episode reward: [(0, '16.923')]
[2024-09-18 11:31:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2408448. Throughput: 0: 888.0. Samples: 601076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:05,850][00268] Avg episode reward: [(0, '16.540')]
[2024-09-18 11:31:07,736][03583] Updated weights for policy 0, policy_version 590 (0.0013)
[2024-09-18 11:31:10,849][00268] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3554.5). Total num frames: 2424832. Throughput: 0: 884.0. Samples: 607214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:31:10,851][00268] Avg episode reward: [(0, '17.007')]
[2024-09-18 11:31:15,849][00268] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3568.4). Total num frames: 2445312. Throughput: 0: 896.5. Samples: 610388. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:15,852][00268] Avg episode reward: [(0, '17.184')]
[2024-09-18 11:31:15,865][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth...
[2024-09-18 11:31:15,995][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth
[2024-09-18 11:31:19,595][03583] Updated weights for policy 0, policy_version 600 (0.0013)
[2024-09-18 11:31:20,848][00268] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2461696. Throughput: 0: 891.3. Samples: 614658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:20,854][00268] Avg episode reward: [(0, '17.517')]
[2024-09-18 11:31:25,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2482176. Throughput: 0: 887.4. Samples: 620718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:25,856][00268] Avg episode reward: [(0, '17.887')]
[2024-09-18 11:31:29,589][03583] Updated weights for policy 0, policy_version 610 (0.0018)
[2024-09-18 11:31:30,850][00268] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3582.2). Total num frames: 2502656. Throughput: 0: 891.9. Samples: 623876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:30,854][00268] Avg episode reward: [(0, '18.923')]
[2024-09-18 11:31:30,856][03570] Saving new best policy, reward=18.923!
[2024-09-18 11:31:35,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2514944. Throughput: 0: 894.1. Samples: 628324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:35,854][00268] Avg episode reward: [(0, '19.366')]
[2024-09-18 11:31:35,866][03570] Saving new best policy, reward=19.366!
[2024-09-18 11:31:40,847][00268] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2535424. Throughput: 0: 886.7. Samples: 633990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:40,850][00268] Avg episode reward: [(0, '17.481')]
[2024-09-18 11:31:41,493][03583] Updated weights for policy 0, policy_version 620 (0.0012)
[2024-09-18 11:31:45,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2555904. Throughput: 0: 890.4. Samples: 637170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:31:45,853][00268] Avg episode reward: [(0, '18.259')]
[2024-09-18 11:31:50,851][00268] Fps is (10 sec: 3275.6, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2568192. Throughput: 0: 908.9. Samples: 641978. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:31:50,856][00268] Avg episode reward: [(0, '17.725')]
[2024-09-18 11:31:53,648][03583] Updated weights for policy 0, policy_version 630 (0.0015)
[2024-09-18 11:31:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2588672. Throughput: 0: 892.3. Samples: 647368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:31:55,850][00268] Avg episode reward: [(0, '17.782')]
[2024-09-18 11:32:00,848][00268] Fps is (10 sec: 4097.5, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2609152. Throughput: 0: 890.1. Samples: 650442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:32:00,850][00268] Avg episode reward: [(0, '17.894')]
[2024-09-18 11:32:04,653][03583] Updated weights for policy 0, policy_version 640 (0.0016)
[2024-09-18 11:32:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2621440. Throughput: 0: 907.6. Samples: 655500. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:05,853][00268] Avg episode reward: [(0, '19.030')]
[2024-09-18 11:32:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 2641920. Throughput: 0: 885.4. Samples: 660562. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:10,850][00268] Avg episode reward: [(0, '19.598')]
[2024-09-18 11:32:10,852][03570] Saving new best policy, reward=19.598!
[2024-09-18 11:32:15,693][03583] Updated weights for policy 0, policy_version 650 (0.0019)
[2024-09-18 11:32:15,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 2662400. Throughput: 0: 881.7. Samples: 663552. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:15,850][00268] Avg episode reward: [(0, '20.710')]
[2024-09-18 11:32:15,857][03570] Saving new best policy, reward=20.710!
[2024-09-18 11:32:20,850][00268] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2674688. Throughput: 0: 900.9. Samples: 668866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:20,852][00268] Avg episode reward: [(0, '20.401')]
[2024-09-18 11:32:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2695168. Throughput: 0: 883.5. Samples: 673748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:25,853][00268] Avg episode reward: [(0, '19.838')]
[2024-09-18 11:32:27,773][03583] Updated weights for policy 0, policy_version 660 (0.0018)
[2024-09-18 11:32:30,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 2715648. Throughput: 0: 880.7. Samples: 676800. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:30,854][00268] Avg episode reward: [(0, '19.531')]
[2024-09-18 11:32:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2732032. Throughput: 0: 898.8. Samples: 682422. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:35,850][00268] Avg episode reward: [(0, '19.685')]
[2024-09-18 11:32:39,840][03583] Updated weights for policy 0, policy_version 670 (0.0015)
[2024-09-18 11:32:40,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2744320. Throughput: 0: 881.2. Samples: 687022. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:32:40,852][00268] Avg episode reward: [(0, '19.965')]
[2024-09-18 11:32:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2768896. Throughput: 0: 881.8. Samples: 690122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:32:45,850][00268] Avg episode reward: [(0, '21.358')]
[2024-09-18 11:32:45,861][03570] Saving new best policy, reward=21.358!
[2024-09-18 11:32:50,622][03583] Updated weights for policy 0, policy_version 680 (0.0017)
[2024-09-18 11:32:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.4, 300 sec: 3582.3). Total num frames: 2785280. Throughput: 0: 899.6. Samples: 695984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:32:50,852][00268] Avg episode reward: [(0, '21.131')]
[2024-09-18 11:32:55,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2797568. Throughput: 0: 882.0. Samples: 700250. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-18 11:32:55,851][00268] Avg episode reward: [(0, '21.886')]
[2024-09-18 11:32:55,914][03570] Saving new best policy, reward=21.886!
[2024-09-18 11:33:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2818048. Throughput: 0: 882.1. Samples: 703246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:33:00,849][00268] Avg episode reward: [(0, '22.067')]
[2024-09-18 11:33:00,933][03570] Saving new best policy, reward=22.067!
[2024-09-18 11:33:02,074][03583] Updated weights for policy 0, policy_version 690 (0.0014)
[2024-09-18 11:33:05,849][00268] Fps is (10 sec: 4095.5, 60 sec: 3618.1, 300 sec: 3582.2). Total num frames: 2838528. Throughput: 0: 900.7. Samples: 709398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:33:05,856][00268] Avg episode reward: [(0, '21.232')]
[2024-09-18 11:33:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2850816. Throughput: 0: 880.1. Samples: 713352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:33:10,850][00268] Avg episode reward: [(0, '20.798')]
[2024-09-18 11:33:14,017][03583] Updated weights for policy 0, policy_version 700 (0.0019)
[2024-09-18 11:33:15,848][00268] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2871296. Throughput: 0: 881.5. Samples: 716468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:33:15,850][00268] Avg episode reward: [(0, '21.792')]
[2024-09-18 11:33:15,858][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000701_2871296.pth...
[2024-09-18 11:33:15,957][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth
[2024-09-18 11:33:20,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 2891776. Throughput: 0: 892.4. Samples: 722582. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:20,856][00268] Avg episode reward: [(0, '21.790')]
[2024-09-18 11:33:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2904064. Throughput: 0: 883.4. Samples: 726774. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:25,854][00268] Avg episode reward: [(0, '21.234')]
[2024-09-18 11:33:25,936][03583] Updated weights for policy 0, policy_version 710 (0.0012)
[2024-09-18 11:33:30,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2924544. Throughput: 0: 881.8. Samples: 729802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:30,854][00268] Avg episode reward: [(0, '22.302')]
[2024-09-18 11:33:30,857][03570] Saving new best policy, reward=22.302!
[2024-09-18 11:33:35,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2945024. Throughput: 0: 888.0. Samples: 735944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:33:35,857][00268] Avg episode reward: [(0, '22.404')]
[2024-09-18 11:33:35,869][03570] Saving new best policy, reward=22.404!
[2024-09-18 11:33:36,405][03583] Updated weights for policy 0, policy_version 720 (0.0015)
[2024-09-18 11:33:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2957312. Throughput: 0: 887.0. Samples: 740164. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:40,855][00268] Avg episode reward: [(0, '23.180')]
[2024-09-18 11:33:40,857][03570] Saving new best policy, reward=23.180!
[2024-09-18 11:33:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2977792. Throughput: 0: 880.3. Samples: 742858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:45,849][00268] Avg episode reward: [(0, '21.975')]
[2024-09-18 11:33:48,257][03583] Updated weights for policy 0, policy_version 730 (0.0016)
[2024-09-18 11:33:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2998272. Throughput: 0: 881.0. Samples: 749044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:33:50,850][00268] Avg episode reward: [(0, '22.104')]
[2024-09-18 11:33:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3014656. Throughput: 0: 897.0. Samples: 753716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:33:55,849][00268] Avg episode reward: [(0, '22.080')]
[2024-09-18 11:34:00,226][03583] Updated weights for policy 0, policy_version 740 (0.0018)
[2024-09-18 11:34:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3031040. Throughput: 0: 883.1. Samples: 756208. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:00,849][00268] Avg episode reward: [(0, '22.730')]
[2024-09-18 11:34:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3051520. Throughput: 0: 884.7. Samples: 762392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:05,850][00268] Avg episode reward: [(0, '21.880')]
[2024-09-18 11:34:10,849][00268] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3067904. Throughput: 0: 899.4. Samples: 767250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:10,853][00268] Avg episode reward: [(0, '21.076')]
[2024-09-18 11:34:11,956][03583] Updated weights for policy 0, policy_version 750 (0.0012)
[2024-09-18 11:34:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3084288. Throughput: 0: 881.8. Samples: 769484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:34:15,857][00268] Avg episode reward: [(0, '20.580')]
[2024-09-18 11:34:20,848][00268] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3104768. Throughput: 0: 880.6. Samples: 775572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:34:20,850][00268] Avg episode reward: [(0, '20.713')]
[2024-09-18 11:34:22,357][03583] Updated weights for policy 0, policy_version 760 (0.0013)
[2024-09-18 11:34:25,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3121152. Throughput: 0: 902.5. Samples: 780776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:25,850][00268] Avg episode reward: [(0, '21.570')]
[2024-09-18 11:34:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3137536. Throughput: 0: 886.6. Samples: 782754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:34:30,850][00268] Avg episode reward: [(0, '21.134')]
[2024-09-18 11:34:34,200][03583] Updated weights for policy 0, policy_version 770 (0.0014)
[2024-09-18 11:34:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3158016. Throughput: 0: 886.8. Samples: 788948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:34:35,850][00268] Avg episode reward: [(0, '21.014')]
[2024-09-18 11:34:40,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3174400. Throughput: 0: 898.6. Samples: 794154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:34:40,850][00268] Avg episode reward: [(0, '20.660')]
[2024-09-18 11:34:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3190784. Throughput: 0: 886.4. Samples: 796098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:45,850][00268] Avg episode reward: [(0, '19.505')]
[2024-09-18 11:34:46,463][03583] Updated weights for policy 0, policy_version 780 (0.0015)
[2024-09-18 11:34:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3211264. Throughput: 0: 882.7. Samples: 802112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:34:50,850][00268] Avg episode reward: [(0, '18.187')]
[2024-09-18 11:34:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3227648. Throughput: 0: 899.2. Samples: 807710. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:34:55,850][00268] Avg episode reward: [(0, '18.231')]
[2024-09-18 11:34:57,915][03583] Updated weights for policy 0, policy_version 790 (0.0013)
[2024-09-18 11:35:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3244032. Throughput: 0: 893.3. Samples: 809682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:35:00,852][00268] Avg episode reward: [(0, '20.332')]
[2024-09-18 11:35:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3264512. Throughput: 0: 884.2. Samples: 815360. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:05,854][00268] Avg episode reward: [(0, '21.054')]
[2024-09-18 11:35:08,426][03583] Updated weights for policy 0, policy_version 800 (0.0015)
[2024-09-18 11:35:10,850][00268] Fps is (10 sec: 3685.4, 60 sec: 3549.8, 300 sec: 3568.3). Total num frames: 3280896. Throughput: 0: 898.9. Samples: 821228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:35:10,853][00268] Avg episode reward: [(0, '20.805')]
[2024-09-18 11:35:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3297280. Throughput: 0: 898.4. Samples: 823182. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-18 11:35:15,853][00268] Avg episode reward: [(0, '21.634')]
[2024-09-18 11:35:15,868][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000805_3297280.pth...
[2024-09-18 11:35:15,980][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth
[2024-09-18 11:35:20,400][03583] Updated weights for policy 0, policy_version 810 (0.0013)
[2024-09-18 11:35:20,848][00268] Fps is (10 sec: 3687.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3317760. Throughput: 0: 881.2. Samples: 828604. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:20,855][00268] Avg episode reward: [(0, '22.731')]
[2024-09-18 11:35:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3338240. Throughput: 0: 901.0. Samples: 834700. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:25,850][00268] Avg episode reward: [(0, '22.013')]
[2024-09-18 11:35:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3350528. Throughput: 0: 901.7. Samples: 836676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:30,850][00268] Avg episode reward: [(0, '21.434')]
[2024-09-18 11:35:32,478][03583] Updated weights for policy 0, policy_version 820 (0.0012)
[2024-09-18 11:35:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3371008. Throughput: 0: 886.2. Samples: 841992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:35,850][00268] Avg episode reward: [(0, '20.932')]
[2024-09-18 11:35:40,848][00268] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3391488. Throughput: 0: 896.7. Samples: 848062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:40,850][00268] Avg episode reward: [(0, '20.247')]
[2024-09-18 11:35:43,472][03583] Updated weights for policy 0, policy_version 830 (0.0017)
[2024-09-18 11:35:45,849][00268] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3403776. Throughput: 0: 900.9. Samples: 850224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-18 11:35:45,856][00268] Avg episode reward: [(0, '19.602')]
[2024-09-18 11:35:50,848][00268] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3424256. Throughput: 0: 886.4. Samples: 855246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:35:50,853][00268] Avg episode reward: [(0, '19.900')]
[2024-09-18 11:35:54,437][03583] Updated weights for policy 0, policy_version 840 (0.0021)
[2024-09-18 11:35:55,848][00268] Fps is (10 sec: 4096.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3444736. Throughput: 0: 895.1. Samples: 861504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:35:55,852][00268] Avg episode reward: [(0, '20.187')]
[2024-09-18 11:36:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3457024. Throughput: 0: 900.6. Samples: 863710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:36:00,855][00268] Avg episode reward: [(0, '19.620')]
[2024-09-18 11:36:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3477504. Throughput: 0: 888.2. Samples: 868572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:05,855][00268] Avg episode reward: [(0, '20.147')]
[2024-09-18 11:36:06,490][03583] Updated weights for policy 0, policy_version 850 (0.0019)
[2024-09-18 11:36:10,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 3497984. Throughput: 0: 886.3. Samples: 874584. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:10,850][00268] Avg episode reward: [(0, '20.240')]
[2024-09-18 11:36:15,851][00268] Fps is (10 sec: 3275.5, 60 sec: 3549.6, 300 sec: 3554.4). Total num frames: 3510272. Throughput: 0: 897.3. Samples: 877058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:15,854][00268] Avg episode reward: [(0, '21.439')]
[2024-09-18 11:36:18,803][03583] Updated weights for policy 0, policy_version 860 (0.0014)
[2024-09-18 11:36:20,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3526656. Throughput: 0: 879.5. Samples: 881570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:20,850][00268] Avg episode reward: [(0, '20.837')]
[2024-09-18 11:36:25,848][00268] Fps is (10 sec: 4097.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3551232. Throughput: 0: 880.8. Samples: 887696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:25,855][00268] Avg episode reward: [(0, '21.172')]
[2024-09-18 11:36:29,501][03583] Updated weights for policy 0, policy_version 870 (0.0014)
[2024-09-18 11:36:30,849][00268] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3563520. Throughput: 0: 893.9. Samples: 890448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:36:30,851][00268] Avg episode reward: [(0, '23.472')]
[2024-09-18 11:36:30,857][03570] Saving new best policy, reward=23.472!
[2024-09-18 11:36:35,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3579904. Throughput: 0: 877.4. Samples: 894728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:35,850][00268] Avg episode reward: [(0, '23.187')]
[2024-09-18 11:36:40,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3481.7, 300 sec: 3540.6). Total num frames: 3600384. Throughput: 0: 871.7. Samples: 900732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:40,849][00268] Avg episode reward: [(0, '23.378')]
[2024-09-18 11:36:41,157][03583] Updated weights for policy 0, policy_version 880 (0.0012)
[2024-09-18 11:36:45,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3616768. Throughput: 0: 888.1. Samples: 903676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:36:45,856][00268] Avg episode reward: [(0, '23.717')]
[2024-09-18 11:36:45,869][03570] Saving new best policy, reward=23.717!
[2024-09-18 11:36:50,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3633152. Throughput: 0: 869.3. Samples: 907690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:50,850][00268] Avg episode reward: [(0, '24.130')]
[2024-09-18 11:36:50,853][03570] Saving new best policy, reward=24.130!
[2024-09-18 11:36:53,308][03583] Updated weights for policy 0, policy_version 890 (0.0018)
[2024-09-18 11:36:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3653632. Throughput: 0: 871.2. Samples: 913790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:36:55,853][00268] Avg episode reward: [(0, '26.798')]
[2024-09-18 11:36:55,873][03570] Saving new best policy, reward=26.798!
[2024-09-18 11:37:00,854][00268] Fps is (10 sec: 3683.9, 60 sec: 3549.5, 300 sec: 3554.4). Total num frames: 3670016. Throughput: 0: 882.7. Samples: 916780. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:00,857][00268] Avg episode reward: [(0, '25.346')]
[2024-09-18 11:37:05,438][03583] Updated weights for policy 0, policy_version 900 (0.0014)
[2024-09-18 11:37:05,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3686400. Throughput: 0: 874.0. Samples: 920898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:05,850][00268] Avg episode reward: [(0, '25.623')]
[2024-09-18 11:37:10,848][00268] Fps is (10 sec: 3688.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3706880. Throughput: 0: 874.0. Samples: 927024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:10,855][00268] Avg episode reward: [(0, '26.154')]
[2024-09-18 11:37:15,805][03583] Updated weights for policy 0, policy_version 910 (0.0012)
[2024-09-18 11:37:15,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.4, 300 sec: 3568.4). Total num frames: 3727360. Throughput: 0: 881.4. Samples: 930108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:15,850][00268] Avg episode reward: [(0, '24.689')]
[2024-09-18 11:37:15,862][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth...
[2024-09-18 11:37:16,006][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000701_2871296.pth
[2024-09-18 11:37:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3739648. Throughput: 0: 880.9. Samples: 934370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:37:20,850][00268] Avg episode reward: [(0, '24.609')]
[2024-09-18 11:37:25,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3760128. Throughput: 0: 878.1. Samples: 940248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:25,849][00268] Avg episode reward: [(0, '22.402')]
[2024-09-18 11:37:27,559][03583] Updated weights for policy 0, policy_version 920 (0.0013)
[2024-09-18 11:37:30,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3780608. Throughput: 0: 880.2. Samples: 943284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:37:30,853][00268] Avg episode reward: [(0, '20.913')]
[2024-09-18 11:37:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3792896. Throughput: 0: 894.9. Samples: 947960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:37:35,854][00268] Avg episode reward: [(0, '19.927')]
[2024-09-18 11:37:39,713][03583] Updated weights for policy 0, policy_version 930 (0.0016)
[2024-09-18 11:37:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3813376. Throughput: 0: 881.6. Samples: 953460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:40,850][00268] Avg episode reward: [(0, '21.082')]
[2024-09-18 11:37:45,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3833856. Throughput: 0: 883.7. Samples: 956540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:37:45,850][00268] Avg episode reward: [(0, '22.413')]
[2024-09-18 11:37:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3846144. Throughput: 0: 902.4. Samples: 961504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:37:50,850][00268] Avg episode reward: [(0, '23.016')]
[2024-09-18 11:37:51,514][03583] Updated weights for policy 0, policy_version 940 (0.0013)
[2024-09-18 11:37:55,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3866624. Throughput: 0: 885.1. Samples: 966854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:37:55,851][00268] Avg episode reward: [(0, '23.986')]
[2024-09-18 11:38:00,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.5, 300 sec: 3554.5). Total num frames: 3887104. Throughput: 0: 884.0. Samples: 969890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:38:00,849][00268] Avg episode reward: [(0, '24.067')]
[2024-09-18 11:38:01,531][03583] Updated weights for policy 0, policy_version 950 (0.0015)
[2024-09-18 11:38:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3899392. Throughput: 0: 903.1. Samples: 975012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:38:05,852][00268] Avg episode reward: [(0, '25.298')]
[2024-09-18 11:38:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3919872. Throughput: 0: 883.1. Samples: 979988. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:38:10,850][00268] Avg episode reward: [(0, '25.019')]
[2024-09-18 11:38:13,865][03583] Updated weights for policy 0, policy_version 960 (0.0023)
[2024-09-18 11:38:15,848][00268] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3940352. Throughput: 0: 882.7. Samples: 983004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:38:15,859][00268] Avg episode reward: [(0, '24.346')]
[2024-09-18 11:38:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3952640. Throughput: 0: 897.3. Samples: 988338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:38:20,852][00268] Avg episode reward: [(0, '23.663')]
[2024-09-18 11:38:25,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3969024. Throughput: 0: 880.9. Samples: 993100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:38:25,855][00268] Avg episode reward: [(0, '23.525')]
[2024-09-18 11:38:25,922][03583] Updated weights for policy 0, policy_version 970 (0.0018)
[2024-09-18 11:38:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3989504. Throughput: 0: 880.3. Samples: 996154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-18 11:38:30,849][00268] Avg episode reward: [(0, '22.843')]
[2024-09-18 11:38:34,754][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:38:34,773][00268] Component Batcher_0 stopped!
[2024-09-18 11:38:34,781][00268] Component RolloutWorker_w0 process died already! Don't wait for it.
[2024-09-18 11:38:34,783][00268] Component RolloutWorker_w6 process died already! Don't wait for it.
[2024-09-18 11:38:34,784][00268] Component RolloutWorker_w7 process died already! Don't wait for it.
[2024-09-18 11:38:34,764][03570] Stopping Batcher_0...
[2024-09-18 11:38:34,819][03570] Loop batcher_evt_loop terminating...
[2024-09-18 11:38:34,880][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000805_3297280.pth
[2024-09-18 11:38:34,882][03583] Weights refcount: 2 0
[2024-09-18 11:38:34,888][03583] Stopping InferenceWorker_p0-w0...
[2024-09-18 11:38:34,891][03583] Loop inference_proc0-0_evt_loop terminating...
[2024-09-18 11:38:34,892][00268] Component InferenceWorker_p0-w0 stopped!
[2024-09-18 11:38:34,898][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:38:35,170][03570] Stopping LearnerWorker_p0...
[2024-09-18 11:38:35,172][03570] Loop learner_proc0_evt_loop terminating...
[2024-09-18 11:38:35,170][00268] Component LearnerWorker_p0 stopped!
[2024-09-18 11:38:35,431][00268] Component RolloutWorker_w4 stopped!
[2024-09-18 11:38:35,431][03590] Stopping RolloutWorker_w4...
[2024-09-18 11:38:35,444][03590] Loop rollout_proc4_evt_loop terminating...
[2024-09-18 11:38:35,493][00268] Component RolloutWorker_w2 stopped!
[2024-09-18 11:38:35,493][03586] Stopping RolloutWorker_w2...
[2024-09-18 11:38:35,497][03586] Loop rollout_proc2_evt_loop terminating...
[2024-09-18 11:38:35,536][00268] Component RolloutWorker_w5 stopped!
[2024-09-18 11:38:35,541][03589] Stopping RolloutWorker_w5...
[2024-09-18 11:38:35,542][03589] Loop rollout_proc5_evt_loop terminating...
[2024-09-18 11:38:35,563][00268] Component RolloutWorker_w3 stopped!
[2024-09-18 11:38:35,566][03587] Stopping RolloutWorker_w3...
[2024-09-18 11:38:35,572][03587] Loop rollout_proc3_evt_loop terminating...
[2024-09-18 11:38:35,582][00268] Component RolloutWorker_w1 stopped!
[2024-09-18 11:38:35,588][00268] Waiting for process learner_proc0 to stop...
[2024-09-18 11:38:35,592][03585] Stopping RolloutWorker_w1...
[2024-09-18 11:38:35,594][03585] Loop rollout_proc1_evt_loop terminating...
[2024-09-18 11:38:37,057][00268] Waiting for process inference_proc0-0 to join...
[2024-09-18 11:38:37,572][00268] Waiting for process rollout_proc0 to join...
[2024-09-18 11:38:37,575][00268] Waiting for process rollout_proc1 to join...
[2024-09-18 11:38:38,374][00268] Waiting for process rollout_proc2 to join...
[2024-09-18 11:38:38,378][00268] Waiting for process rollout_proc3 to join...
[2024-09-18 11:38:38,387][00268] Waiting for process rollout_proc4 to join...
[2024-09-18 11:38:38,391][00268] Waiting for process rollout_proc5 to join...
[2024-09-18 11:38:38,395][00268] Waiting for process rollout_proc6 to join...
[2024-09-18 11:38:38,397][00268] Waiting for process rollout_proc7 to join...
[2024-09-18 11:38:38,399][00268] Batcher 0 profile tree view:
batching: 24.2802, releasing_batches: 0.0240
[2024-09-18 11:38:38,400][00268] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 475.0022
update_model: 8.5899
weight_update: 0.0013
one_step: 0.0141
handle_policy_step: 608.9279
deserialize: 15.9187, stack: 3.4847, obs_to_device_normalize: 130.2658, forward: 312.9496, send_messages: 22.9440
prepare_outputs: 92.1547
to_cpu: 59.4190
[2024-09-18 11:38:38,402][00268] Learner 0 profile tree view:
misc: 0.0054, prepare_batch: 15.7238
train: 70.6340
epoch_init: 0.0060, minibatch_init: 0.0069, losses_postprocess: 0.4975, kl_divergence: 0.5171, after_optimizer: 32.7668
calculate_losses: 22.8399
losses_init: 0.0044, forward_head: 1.6228, bptt_initial: 14.9823, tail: 1.0065, advantages_returns: 0.2676, losses: 2.4548
bptt: 2.1843
bptt_forward_core: 2.0930
update: 13.4511
clip: 1.3997
[2024-09-18 11:38:38,404][00268] Loop Runner_EvtLoop terminating...
[2024-09-18 11:38:38,406][00268] Runner profile tree view:
main_loop: 1158.9681
[2024-09-18 11:38:38,408][00268] Collected {0: 4005888}, FPS: 3456.4
[2024-09-18 11:38:38,688][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 11:38:38,690][00268] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-18 11:38:38,692][00268] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-18 11:38:38,698][00268] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-18 11:38:38,700][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 11:38:38,701][00268] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-18 11:38:38,706][00268] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 11:38:38,708][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-18 11:38:38,710][00268] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-18 11:38:38,711][00268] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-18 11:38:38,712][00268] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-18 11:38:38,713][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-18 11:38:38,714][00268] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-18 11:38:38,715][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-18 11:38:38,716][00268] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-18 11:38:38,740][00268] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:38:38,744][00268] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:38:38,747][00268] RunningMeanStd input shape: (1,)
[2024-09-18 11:38:38,763][00268] ConvEncoder: input_channels=3
[2024-09-18 11:38:38,887][00268] Conv encoder output size: 512
[2024-09-18 11:38:38,889][00268] Policy head output size: 512
[2024-09-18 11:38:40,470][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:38:41,315][00268] Num frames 100...
[2024-09-18 11:38:41,433][00268] Num frames 200...
[2024-09-18 11:38:41,554][00268] Num frames 300...
[2024-09-18 11:38:41,672][00268] Num frames 400...
[2024-09-18 11:38:41,783][00268] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2024-09-18 11:38:41,784][00268] Avg episode reward: 5.480, avg true_objective: 4.480
[2024-09-18 11:38:41,852][00268] Num frames 500...
[2024-09-18 11:38:41,981][00268] Num frames 600...
[2024-09-18 11:38:42,106][00268] Num frames 700...
[2024-09-18 11:38:42,223][00268] Num frames 800...
[2024-09-18 11:38:42,340][00268] Num frames 900...
[2024-09-18 11:38:42,459][00268] Num frames 1000...
[2024-09-18 11:38:42,577][00268] Num frames 1100...
[2024-09-18 11:38:42,692][00268] Num frames 1200...
[2024-09-18 11:38:42,810][00268] Num frames 1300...
[2024-09-18 11:38:42,936][00268] Num frames 1400...
[2024-09-18 11:38:43,055][00268] Num frames 1500...
[2024-09-18 11:38:43,184][00268] Num frames 1600...
[2024-09-18 11:38:43,302][00268] Num frames 1700...
[2024-09-18 11:38:43,433][00268] Avg episode rewards: #0: 20.805, true rewards: #0: 8.805
[2024-09-18 11:38:43,435][00268] Avg episode reward: 20.805, avg true_objective: 8.805
[2024-09-18 11:38:43,487][00268] Num frames 1800...
[2024-09-18 11:38:43,603][00268] Num frames 1900...
[2024-09-18 11:38:43,721][00268] Num frames 2000...
[2024-09-18 11:38:43,840][00268] Num frames 2100...
[2024-09-18 11:38:43,970][00268] Num frames 2200...
[2024-09-18 11:38:44,038][00268] Avg episode rewards: #0: 15.697, true rewards: #0: 7.363
[2024-09-18 11:38:44,039][00268] Avg episode reward: 15.697, avg true_objective: 7.363
[2024-09-18 11:38:44,155][00268] Num frames 2300...
[2024-09-18 11:38:44,273][00268] Num frames 2400...
[2024-09-18 11:38:44,392][00268] Num frames 2500...
[2024-09-18 11:38:44,513][00268] Num frames 2600...
[2024-09-18 11:38:44,631][00268] Num frames 2700...
[2024-09-18 11:38:44,750][00268] Num frames 2800...
[2024-09-18 11:38:44,871][00268] Num frames 2900...
[2024-09-18 11:38:45,007][00268] Num frames 3000...
[2024-09-18 11:38:45,126][00268] Num frames 3100...
[2024-09-18 11:38:45,254][00268] Num frames 3200...
[2024-09-18 11:38:45,374][00268] Num frames 3300...
[2024-09-18 11:38:45,499][00268] Num frames 3400...
[2024-09-18 11:38:45,619][00268] Num frames 3500...
[2024-09-18 11:38:45,740][00268] Num frames 3600...
[2024-09-18 11:38:45,861][00268] Num frames 3700...
[2024-09-18 11:38:45,995][00268] Num frames 3800...
[2024-09-18 11:38:46,114][00268] Num frames 3900...
[2024-09-18 11:38:46,242][00268] Num frames 4000...
[2024-09-18 11:38:46,366][00268] Num frames 4100...
[2024-09-18 11:38:46,487][00268] Num frames 4200...
[2024-09-18 11:38:46,606][00268] Num frames 4300...
[2024-09-18 11:38:46,675][00268] Avg episode rewards: #0: 26.522, true rewards: #0: 10.773
[2024-09-18 11:38:46,677][00268] Avg episode reward: 26.522, avg true_objective: 10.773
[2024-09-18 11:38:46,786][00268] Num frames 4400...
[2024-09-18 11:38:46,918][00268] Num frames 4500...
[2024-09-18 11:38:47,045][00268] Num frames 4600...
[2024-09-18 11:38:47,164][00268] Num frames 4700...
[2024-09-18 11:38:47,297][00268] Num frames 4800...
[2024-09-18 11:38:47,417][00268] Num frames 4900...
[2024-09-18 11:38:47,550][00268] Num frames 5000...
[2024-09-18 11:38:47,698][00268] Num frames 5100...
[2024-09-18 11:38:47,868][00268] Num frames 5200...
[2024-09-18 11:38:48,041][00268] Num frames 5300...
[2024-09-18 11:38:48,204][00268] Num frames 5400...
[2024-09-18 11:38:48,378][00268] Num frames 5500...
[2024-09-18 11:38:48,545][00268] Num frames 5600...
[2024-09-18 11:38:48,707][00268] Num frames 5700...
[2024-09-18 11:38:48,872][00268] Num frames 5800...
[2024-09-18 11:38:49,004][00268] Avg episode rewards: #0: 28.280, true rewards: #0: 11.680
[2024-09-18 11:38:49,007][00268] Avg episode reward: 28.280, avg true_objective: 11.680
[2024-09-18 11:38:49,106][00268] Num frames 5900...
[2024-09-18 11:38:49,275][00268] Num frames 6000...
[2024-09-18 11:38:49,459][00268] Num frames 6100...
[2024-09-18 11:38:49,628][00268] Num frames 6200...
[2024-09-18 11:38:49,796][00268] Num frames 6300...
[2024-09-18 11:38:49,962][00268] Num frames 6400...
[2024-09-18 11:38:50,083][00268] Num frames 6500...
[2024-09-18 11:38:50,204][00268] Num frames 6600...
[2024-09-18 11:38:50,326][00268] Num frames 6700...
[2024-09-18 11:38:50,478][00268] Avg episode rewards: #0: 27.621, true rewards: #0: 11.288
[2024-09-18 11:38:50,479][00268] Avg episode reward: 27.621, avg true_objective: 11.288
[2024-09-18 11:38:50,516][00268] Num frames 6800...
[2024-09-18 11:38:50,630][00268] Num frames 6900...
[2024-09-18 11:38:50,751][00268] Num frames 7000...
[2024-09-18 11:38:50,870][00268] Num frames 7100...
[2024-09-18 11:38:50,999][00268] Num frames 7200...
[2024-09-18 11:38:51,115][00268] Num frames 7300...
[2024-09-18 11:38:51,233][00268] Num frames 7400...
[2024-09-18 11:38:51,358][00268] Num frames 7500...
[2024-09-18 11:38:51,479][00268] Num frames 7600...
[2024-09-18 11:38:51,597][00268] Num frames 7700...
[2024-09-18 11:38:51,718][00268] Num frames 7800...
[2024-09-18 11:38:51,839][00268] Num frames 7900...
[2024-09-18 11:38:51,972][00268] Num frames 8000...
[2024-09-18 11:38:52,089][00268] Num frames 8100...
[2024-09-18 11:38:52,208][00268] Num frames 8200...
[2024-09-18 11:38:52,339][00268] Num frames 8300...
[2024-09-18 11:38:52,472][00268] Num frames 8400...
[2024-09-18 11:38:52,595][00268] Num frames 8500...
[2024-09-18 11:38:52,715][00268] Num frames 8600...
[2024-09-18 11:38:52,836][00268] Num frames 8700...
[2024-09-18 11:38:52,965][00268] Num frames 8800...
[2024-09-18 11:38:53,109][00268] Avg episode rewards: #0: 32.390, true rewards: #0: 12.676
[2024-09-18 11:38:53,111][00268] Avg episode reward: 32.390, avg true_objective: 12.676
[2024-09-18 11:38:53,147][00268] Num frames 8900...
[2024-09-18 11:38:53,263][00268] Num frames 9000...
[2024-09-18 11:38:53,385][00268] Num frames 9100...
[2024-09-18 11:38:53,518][00268] Num frames 9200...
[2024-09-18 11:38:53,637][00268] Num frames 9300...
[2024-09-18 11:38:53,759][00268] Num frames 9400...
[2024-09-18 11:38:53,881][00268] Num frames 9500...
[2024-09-18 11:38:54,017][00268] Num frames 9600...
[2024-09-18 11:38:54,147][00268] Num frames 9700...
[2024-09-18 11:38:54,271][00268] Num frames 9800...
[2024-09-18 11:38:54,422][00268] Avg episode rewards: #0: 30.846, true rewards: #0: 12.346
[2024-09-18 11:38:54,424][00268] Avg episode reward: 30.846, avg true_objective: 12.346
[2024-09-18 11:38:54,465][00268] Num frames 9900...
[2024-09-18 11:38:54,581][00268] Num frames 10000...
[2024-09-18 11:38:54,697][00268] Num frames 10100...
[2024-09-18 11:38:54,818][00268] Num frames 10200...
[2024-09-18 11:38:54,945][00268] Num frames 10300...
[2024-09-18 11:38:55,063][00268] Num frames 10400...
[2024-09-18 11:38:55,179][00268] Num frames 10500...
[2024-09-18 11:38:55,298][00268] Num frames 10600...
[2024-09-18 11:38:55,416][00268] Num frames 10700...
[2024-09-18 11:38:55,548][00268] Num frames 10800...
[2024-09-18 11:38:55,665][00268] Num frames 10900...
[2024-09-18 11:38:55,788][00268] Num frames 11000...
[2024-09-18 11:38:55,944][00268] Avg episode rewards: #0: 30.316, true rewards: #0: 12.317
[2024-09-18 11:38:55,947][00268] Avg episode reward: 30.316, avg true_objective: 12.317
[2024-09-18 11:38:55,968][00268] Num frames 11100...
[2024-09-18 11:38:56,084][00268] Num frames 11200...
[2024-09-18 11:38:56,204][00268] Num frames 11300...
[2024-09-18 11:38:56,323][00268] Num frames 11400...
[2024-09-18 11:38:56,444][00268] Num frames 11500...
[2024-09-18 11:38:56,573][00268] Num frames 11600...
[2024-09-18 11:38:56,692][00268] Num frames 11700...
[2024-09-18 11:38:56,814][00268] Num frames 11800...
[2024-09-18 11:38:56,944][00268] Num frames 11900...
[2024-09-18 11:38:57,070][00268] Num frames 12000...
[2024-09-18 11:38:57,191][00268] Num frames 12100...
[2024-09-18 11:38:57,308][00268] Num frames 12200...
[2024-09-18 11:38:57,427][00268] Num frames 12300...
[2024-09-18 11:38:57,562][00268] Num frames 12400...
[2024-09-18 11:38:57,685][00268] Num frames 12500...
[2024-09-18 11:38:57,805][00268] Num frames 12600...
[2024-09-18 11:38:57,932][00268] Num frames 12700...
[2024-09-18 11:38:58,055][00268] Num frames 12800...
[2024-09-18 11:38:58,176][00268] Num frames 12900...
[2024-09-18 11:38:58,289][00268] Avg episode rewards: #0: 32.245, true rewards: #0: 12.945
[2024-09-18 11:38:58,290][00268] Avg episode reward: 32.245, avg true_objective: 12.945
[2024-09-18 11:40:20,882][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-18 11:41:19,372][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 11:41:19,374][00268] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-18 11:41:19,376][00268] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-18 11:41:19,378][00268] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-18 11:41:19,380][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 11:41:19,382][00268] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-18 11:41:19,385][00268] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-18 11:41:19,387][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-18 11:41:19,388][00268] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-18 11:41:19,391][00268] Adding new argument 'hf_repository'='mkdem/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-18 11:41:19,392][00268] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-18 11:41:19,394][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-18 11:41:19,395][00268] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-18 11:41:19,396][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-18 11:41:19,399][00268] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-18 11:41:19,410][00268] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:41:19,412][00268] RunningMeanStd input shape: (1,)
[2024-09-18 11:41:19,425][00268] ConvEncoder: input_channels=3
[2024-09-18 11:41:19,462][00268] Conv encoder output size: 512
[2024-09-18 11:41:19,463][00268] Policy head output size: 512
[2024-09-18 11:41:19,482][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:41:20,009][00268] Num frames 100...
[2024-09-18 11:41:20,124][00268] Num frames 200...
[2024-09-18 11:41:20,243][00268] Num frames 300...
[2024-09-18 11:41:20,360][00268] Num frames 400...
[2024-09-18 11:41:20,477][00268] Num frames 500...
[2024-09-18 11:41:20,599][00268] Num frames 600...
[2024-09-18 11:41:20,717][00268] Num frames 700...
[2024-09-18 11:41:20,835][00268] Num frames 800...
[2024-09-18 11:41:20,971][00268] Num frames 900...
[2024-09-18 11:41:21,096][00268] Num frames 1000...
[2024-09-18 11:41:21,213][00268] Num frames 1100...
[2024-09-18 11:41:21,335][00268] Num frames 1200...
[2024-09-18 11:41:21,456][00268] Num frames 1300...
[2024-09-18 11:41:21,596][00268] Num frames 1400...
[2024-09-18 11:41:21,767][00268] Num frames 1500...
[2024-09-18 11:41:21,994][00268] Num frames 1600...
[2024-09-18 11:41:22,171][00268] Num frames 1700...
[2024-09-18 11:41:22,360][00268] Num frames 1800...
[2024-09-18 11:41:22,540][00268] Num frames 1900...
[2024-09-18 11:41:22,786][00268] Num frames 2000...
[2024-09-18 11:41:23,096][00268] Num frames 2100...
[2024-09-18 11:41:23,148][00268] Avg episode rewards: #0: 60.999, true rewards: #0: 21.000
[2024-09-18 11:41:23,154][00268] Avg episode reward: 60.999, avg true_objective: 21.000
[2024-09-18 11:41:23,386][00268] Num frames 2200...
[2024-09-18 11:41:23,677][00268] Num frames 2300...
[2024-09-18 11:41:24,024][00268] Num frames 2400...
[2024-09-18 11:41:24,195][00268] Num frames 2500...
[2024-09-18 11:41:24,403][00268] Num frames 2600...
[2024-09-18 11:41:24,619][00268] Num frames 2700...
[2024-09-18 11:41:24,950][00268] Num frames 2800...
[2024-09-18 11:41:25,262][00268] Num frames 2900...
[2024-09-18 11:41:25,512][00268] Num frames 3000...
[2024-09-18 11:41:25,757][00268] Avg episode rewards: #0: 43.459, true rewards: #0: 15.460
[2024-09-18 11:41:25,759][00268] Avg episode reward: 43.459, avg true_objective: 15.460
[2024-09-18 11:41:25,773][00268] Num frames 3100...
[2024-09-18 11:41:25,894][00268] Num frames 3200...
[2024-09-18 11:41:26,020][00268] Num frames 3300...
[2024-09-18 11:41:26,146][00268] Num frames 3400...
[2024-09-18 11:41:26,269][00268] Num frames 3500...
[2024-09-18 11:41:26,396][00268] Num frames 3600...
[2024-09-18 11:41:26,517][00268] Num frames 3700...
[2024-09-18 11:41:26,635][00268] Num frames 3800...
[2024-09-18 11:41:26,753][00268] Num frames 3900...
[2024-09-18 11:41:26,879][00268] Num frames 4000...
[2024-09-18 11:41:27,051][00268] Avg episode rewards: #0: 36.633, true rewards: #0: 13.633
[2024-09-18 11:41:27,053][00268] Avg episode reward: 36.633, avg true_objective: 13.633
[2024-09-18 11:41:27,068][00268] Num frames 4100...
[2024-09-18 11:41:27,198][00268] Num frames 4200...
[2024-09-18 11:41:27,369][00268] Num frames 4300...
[2024-09-18 11:41:27,539][00268] Num frames 4400...
[2024-09-18 11:41:27,698][00268] Num frames 4500...
[2024-09-18 11:41:27,870][00268] Num frames 4600...
[2024-09-18 11:41:28,042][00268] Num frames 4700...
[2024-09-18 11:41:28,206][00268] Num frames 4800...
[2024-09-18 11:41:28,374][00268] Num frames 4900...
[2024-09-18 11:41:28,470][00268] Avg episode rewards: #0: 31.555, true rewards: #0: 12.305
[2024-09-18 11:41:28,472][00268] Avg episode reward: 31.555, avg true_objective: 12.305
[2024-09-18 11:41:28,602][00268] Num frames 5000...
[2024-09-18 11:41:28,770][00268] Num frames 5100...
[2024-09-18 11:41:28,942][00268] Num frames 5200...
[2024-09-18 11:41:29,107][00268] Num frames 5300...
[2024-09-18 11:41:29,289][00268] Num frames 5400...
[2024-09-18 11:41:29,459][00268] Num frames 5500...
[2024-09-18 11:41:29,625][00268] Num frames 5600...
[2024-09-18 11:41:29,774][00268] Num frames 5700...
[2024-09-18 11:41:29,903][00268] Num frames 5800...
[2024-09-18 11:41:30,031][00268] Num frames 5900...
[2024-09-18 11:41:30,154][00268] Num frames 6000...
[2024-09-18 11:41:30,284][00268] Num frames 6100...
[2024-09-18 11:41:30,406][00268] Num frames 6200...
[2024-09-18 11:41:30,525][00268] Num frames 6300...
[2024-09-18 11:41:30,642][00268] Num frames 6400...
[2024-09-18 11:41:30,760][00268] Num frames 6500...
[2024-09-18 11:41:30,880][00268] Num frames 6600...
[2024-09-18 11:41:31,002][00268] Num frames 6700...
[2024-09-18 11:41:31,075][00268] Avg episode rewards: #0: 34.228, true rewards: #0: 13.428
[2024-09-18 11:41:31,077][00268] Avg episode reward: 34.228, avg true_objective: 13.428
[2024-09-18 11:41:31,176][00268] Num frames 6800...
[2024-09-18 11:41:31,305][00268] Num frames 6900...
[2024-09-18 11:41:31,433][00268] Num frames 7000...
[2024-09-18 11:41:31,550][00268] Num frames 7100...
[2024-09-18 11:41:31,668][00268] Num frames 7200...
[2024-09-18 11:41:31,787][00268] Num frames 7300...
[2024-09-18 11:41:31,904][00268] Num frames 7400...
[2024-09-18 11:41:32,030][00268] Num frames 7500...
[2024-09-18 11:41:32,148][00268] Num frames 7600...
[2024-09-18 11:41:32,271][00268] Num frames 7700...
[2024-09-18 11:41:32,417][00268] Num frames 7800...
[2024-09-18 11:41:32,533][00268] Num frames 7900...
[2024-09-18 11:41:32,653][00268] Num frames 8000...
[2024-09-18 11:41:32,775][00268] Num frames 8100...
[2024-09-18 11:41:32,892][00268] Num frames 8200...
[2024-09-18 11:41:33,023][00268] Num frames 8300...
[2024-09-18 11:41:33,139][00268] Num frames 8400...
[2024-09-18 11:41:33,257][00268] Num frames 8500...
[2024-09-18 11:41:33,391][00268] Num frames 8600...
[2024-09-18 11:41:33,511][00268] Num frames 8700...
[2024-09-18 11:41:33,607][00268] Avg episode rewards: #0: 38.050, true rewards: #0: 14.550
[2024-09-18 11:41:33,609][00268] Avg episode reward: 38.050, avg true_objective: 14.550
[2024-09-18 11:41:33,694][00268] Num frames 8800...
[2024-09-18 11:41:33,815][00268] Num frames 8900...
[2024-09-18 11:41:33,942][00268] Num frames 9000...
[2024-09-18 11:41:34,056][00268] Num frames 9100...
[2024-09-18 11:41:34,177][00268] Num frames 9200...
[2024-09-18 11:41:34,282][00268] Avg episode rewards: #0: 34.060, true rewards: #0: 13.203
[2024-09-18 11:41:34,284][00268] Avg episode reward: 34.060, avg true_objective: 13.203
[2024-09-18 11:41:34,358][00268] Num frames 9300...
[2024-09-18 11:41:34,496][00268] Num frames 9400...
[2024-09-18 11:41:34,616][00268] Num frames 9500...
[2024-09-18 11:41:34,730][00268] Num frames 9600...
[2024-09-18 11:41:34,851][00268] Num frames 9700...
[2024-09-18 11:41:34,974][00268] Num frames 9800...
[2024-09-18 11:41:35,094][00268] Num frames 9900...
[2024-09-18 11:41:35,213][00268] Num frames 10000...
[2024-09-18 11:41:35,330][00268] Num frames 10100...
[2024-09-18 11:41:35,464][00268] Num frames 10200...
[2024-09-18 11:41:35,582][00268] Num frames 10300...
[2024-09-18 11:41:35,702][00268] Num frames 10400...
[2024-09-18 11:41:35,827][00268] Num frames 10500...
[2024-09-18 11:41:35,954][00268] Num frames 10600...
[2024-09-18 11:41:36,076][00268] Num frames 10700...
[2024-09-18 11:41:36,191][00268] Num frames 10800...
[2024-09-18 11:41:36,315][00268] Num frames 10900...
[2024-09-18 11:41:36,451][00268] Num frames 11000...
[2024-09-18 11:41:36,607][00268] Num frames 11100...
[2024-09-18 11:41:36,767][00268] Num frames 11200...
[2024-09-18 11:41:36,932][00268] Num frames 11300...
[2024-09-18 11:41:37,057][00268] Avg episode rewards: #0: 36.802, true rewards: #0: 14.178
[2024-09-18 11:41:37,059][00268] Avg episode reward: 36.802, avg true_objective: 14.178
[2024-09-18 11:41:37,128][00268] Num frames 11400...
[2024-09-18 11:41:37,246][00268] Num frames 11500...
[2024-09-18 11:41:37,369][00268] Num frames 11600...
[2024-09-18 11:41:37,496][00268] Num frames 11700...
[2024-09-18 11:41:37,613][00268] Num frames 11800...
[2024-09-18 11:41:37,723][00268] Avg episode rewards: #0: 34.160, true rewards: #0: 13.160
[2024-09-18 11:41:37,725][00268] Avg episode reward: 34.160, avg true_objective: 13.160
[2024-09-18 11:41:37,793][00268] Num frames 11900...
[2024-09-18 11:41:37,915][00268] Num frames 12000...
[2024-09-18 11:41:38,041][00268] Num frames 12100...
[2024-09-18 11:41:38,165][00268] Num frames 12200...
[2024-09-18 11:41:38,291][00268] Num frames 12300...
[2024-09-18 11:41:38,413][00268] Num frames 12400...
[2024-09-18 11:41:38,541][00268] Num frames 12500...
[2024-09-18 11:41:38,665][00268] Num frames 12600...
[2024-09-18 11:41:38,785][00268] Num frames 12700...
[2024-09-18 11:41:38,907][00268] Num frames 12800...
[2024-09-18 11:41:39,031][00268] Num frames 12900...
[2024-09-18 11:41:39,152][00268] Num frames 13000...
[2024-09-18 11:41:39,271][00268] Num frames 13100...
[2024-09-18 11:41:39,396][00268] Num frames 13200...
[2024-09-18 11:41:39,516][00268] Avg episode rewards: #0: 34.451, true rewards: #0: 13.251
[2024-09-18 11:41:39,518][00268] Avg episode reward: 34.451, avg true_objective: 13.251
[2024-09-18 11:43:05,352][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-18 11:43:42,216][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 11:43:42,218][00268] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-18 11:43:42,220][00268] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-18 11:43:42,222][00268] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-18 11:43:42,224][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 11:43:42,225][00268] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-18 11:43:42,227][00268] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-18 11:43:42,229][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-18 11:43:42,230][00268] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-18 11:43:42,231][00268] Adding new argument 'hf_repository'='mkdem/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-18 11:43:42,232][00268] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-18 11:43:42,233][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-18 11:43:42,234][00268] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-18 11:43:42,235][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-18 11:43:42,236][00268] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-18 11:43:42,256][00268] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:43:42,257][00268] RunningMeanStd input shape: (1,)
[2024-09-18 11:43:42,271][00268] ConvEncoder: input_channels=3
[2024-09-18 11:43:42,308][00268] Conv encoder output size: 512
[2024-09-18 11:43:42,309][00268] Policy head output size: 512
[2024-09-18 11:43:42,328][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:43:42,803][00268] Num frames 100...
[2024-09-18 11:43:42,933][00268] Num frames 200...
[2024-09-18 11:43:43,065][00268] Num frames 300...
[2024-09-18 11:43:43,185][00268] Num frames 400...
[2024-09-18 11:43:43,303][00268] Num frames 500...
[2024-09-18 11:43:43,423][00268] Num frames 600...
[2024-09-18 11:43:43,544][00268] Num frames 700...
[2024-09-18 11:43:43,671][00268] Num frames 800...
[2024-09-18 11:43:43,795][00268] Num frames 900...
[2024-09-18 11:43:43,920][00268] Num frames 1000...
[2024-09-18 11:43:44,044][00268] Num frames 1100...
[2024-09-18 11:43:44,167][00268] Num frames 1200...
[2024-09-18 11:43:44,287][00268] Num frames 1300...
[2024-09-18 11:43:44,413][00268] Num frames 1400...
[2024-09-18 11:43:44,537][00268] Num frames 1500...
[2024-09-18 11:43:44,660][00268] Num frames 1600...
[2024-09-18 11:43:44,793][00268] Num frames 1700...
[2024-09-18 11:43:44,919][00268] Num frames 1800...
[2024-09-18 11:43:45,044][00268] Num frames 1900...
[2024-09-18 11:43:45,168][00268] Num frames 2000...
[2024-09-18 11:43:45,291][00268] Num frames 2100...
[2024-09-18 11:43:45,346][00268] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000
[2024-09-18 11:43:45,348][00268] Avg episode reward: 58.999, avg true_objective: 21.000
[2024-09-18 11:43:45,476][00268] Num frames 2200...
[2024-09-18 11:43:45,601][00268] Num frames 2300...
[2024-09-18 11:43:45,738][00268] Num frames 2400...
[2024-09-18 11:43:45,858][00268] Num frames 2500...
[2024-09-18 11:43:45,991][00268] Num frames 2600...
[2024-09-18 11:43:46,061][00268] Avg episode rewards: #0: 33.559, true rewards: #0: 13.060
[2024-09-18 11:43:46,063][00268] Avg episode reward: 33.559, avg true_objective: 13.060
[2024-09-18 11:43:46,166][00268] Num frames 2700...
[2024-09-18 11:43:46,281][00268] Num frames 2800...
[2024-09-18 11:43:46,400][00268] Num frames 2900...
[2024-09-18 11:43:46,515][00268] Num frames 3000...
[2024-09-18 11:43:46,638][00268] Avg episode rewards: #0: 24.200, true rewards: #0: 10.200
[2024-09-18 11:43:46,644][00268] Avg episode reward: 24.200, avg true_objective: 10.200
[2024-09-18 11:43:46,696][00268] Num frames 3100...
[2024-09-18 11:43:46,824][00268] Num frames 3200...
[2024-09-18 11:43:46,954][00268] Num frames 3300...
[2024-09-18 11:43:47,072][00268] Num frames 3400...
[2024-09-18 11:43:47,190][00268] Num frames 3500...
[2024-09-18 11:43:47,310][00268] Num frames 3600...
[2024-09-18 11:43:47,428][00268] Num frames 3700...
[2024-09-18 11:43:47,556][00268] Num frames 3800...
[2024-09-18 11:43:47,662][00268] Avg episode rewards: #0: 21.820, true rewards: #0: 9.570
[2024-09-18 11:43:47,664][00268] Avg episode reward: 21.820, avg true_objective: 9.570
[2024-09-18 11:43:47,793][00268] Num frames 3900...
[2024-09-18 11:43:47,976][00268] Num frames 4000...
[2024-09-18 11:43:48,135][00268] Num frames 4100...
[2024-09-18 11:43:48,301][00268] Num frames 4200...
[2024-09-18 11:43:48,463][00268] Num frames 4300...
[2024-09-18 11:43:48,631][00268] Num frames 4400...
[2024-09-18 11:43:48,789][00268] Num frames 4500...
[2024-09-18 11:43:48,974][00268] Num frames 4600...
[2024-09-18 11:43:49,145][00268] Num frames 4700...
[2024-09-18 11:43:49,322][00268] Num frames 4800...
[2024-09-18 11:43:49,486][00268] Num frames 4900...
[2024-09-18 11:43:49,655][00268] Num frames 5000...
[2024-09-18 11:43:49,829][00268] Num frames 5100...
[2024-09-18 11:43:50,007][00268] Num frames 5200...
[2024-09-18 11:43:50,155][00268] Num frames 5300...
[2024-09-18 11:43:50,276][00268] Num frames 5400...
[2024-09-18 11:43:50,397][00268] Num frames 5500...
[2024-09-18 11:43:50,516][00268] Num frames 5600...
[2024-09-18 11:43:50,631][00268] Num frames 5700...
[2024-09-18 11:43:50,755][00268] Num frames 5800...
[2024-09-18 11:43:50,876][00268] Num frames 5900...
[2024-09-18 11:43:50,969][00268] Avg episode rewards: #0: 29.456, true rewards: #0: 11.856
[2024-09-18 11:43:50,972][00268] Avg episode reward: 29.456, avg true_objective: 11.856
[2024-09-18 11:43:51,062][00268] Num frames 6000...
[2024-09-18 11:43:51,181][00268] Num frames 6100...
[2024-09-18 11:43:51,299][00268] Num frames 6200...
[2024-09-18 11:43:51,429][00268] Num frames 6300...
[2024-09-18 11:43:51,548][00268] Num frames 6400...
[2024-09-18 11:43:51,664][00268] Num frames 6500...
[2024-09-18 11:43:51,725][00268] Avg episode rewards: #0: 26.506, true rewards: #0: 10.840
[2024-09-18 11:43:51,726][00268] Avg episode reward: 26.506, avg true_objective: 10.840
[2024-09-18 11:43:51,843][00268] Num frames 6600...
[2024-09-18 11:43:51,979][00268] Num frames 6700...
[2024-09-18 11:43:52,100][00268] Num frames 6800...
[2024-09-18 11:43:52,216][00268] Num frames 6900...
[2024-09-18 11:43:52,331][00268] Num frames 7000...
[2024-09-18 11:43:52,452][00268] Num frames 7100...
[2024-09-18 11:43:52,595][00268] Avg episode rewards: #0: 24.394, true rewards: #0: 10.251
[2024-09-18 11:43:52,596][00268] Avg episode reward: 24.394, avg true_objective: 10.251
[2024-09-18 11:43:52,627][00268] Num frames 7200...
[2024-09-18 11:43:52,747][00268] Num frames 7300...
[2024-09-18 11:43:52,868][00268] Num frames 7400...
[2024-09-18 11:43:53,021][00268] Num frames 7500...
[2024-09-18 11:43:53,149][00268] Num frames 7600...
[2024-09-18 11:43:53,273][00268] Num frames 7700...
[2024-09-18 11:43:53,403][00268] Num frames 7800...
[2024-09-18 11:43:53,519][00268] Num frames 7900...
[2024-09-18 11:43:53,640][00268] Num frames 8000...
[2024-09-18 11:43:53,759][00268] Num frames 8100...
[2024-09-18 11:43:53,877][00268] Num frames 8200...
[2024-09-18 11:43:54,013][00268] Num frames 8300...
[2024-09-18 11:43:54,135][00268] Num frames 8400...
[2024-09-18 11:43:54,256][00268] Num frames 8500...
[2024-09-18 11:43:54,379][00268] Num frames 8600...
[2024-09-18 11:43:54,497][00268] Num frames 8700...
[2024-09-18 11:43:54,641][00268] Avg episode rewards: #0: 26.095, true rewards: #0: 10.970
[2024-09-18 11:43:54,643][00268] Avg episode reward: 26.095, avg true_objective: 10.970
[2024-09-18 11:43:54,673][00268] Num frames 8800...
[2024-09-18 11:43:54,792][00268] Num frames 8900...
[2024-09-18 11:43:54,910][00268] Num frames 9000...
[2024-09-18 11:43:55,053][00268] Num frames 9100...
[2024-09-18 11:43:55,167][00268] Num frames 9200...
[2024-09-18 11:43:55,284][00268] Num frames 9300...
[2024-09-18 11:43:55,403][00268] Num frames 9400...
[2024-09-18 11:43:55,520][00268] Num frames 9500...
[2024-09-18 11:43:55,635][00268] Num frames 9600...
[2024-09-18 11:43:55,701][00268] Avg episode rewards: #0: 25.453, true rewards: #0: 10.676
[2024-09-18 11:43:55,703][00268] Avg episode reward: 25.453, avg true_objective: 10.676
[2024-09-18 11:43:55,808][00268] Num frames 9700...
[2024-09-18 11:43:55,933][00268] Num frames 9800...
[2024-09-18 11:43:56,054][00268] Num frames 9900...
[2024-09-18 11:43:56,172][00268] Num frames 10000...
[2024-09-18 11:43:56,288][00268] Num frames 10100...
[2024-09-18 11:43:56,408][00268] Num frames 10200...
[2024-09-18 11:43:56,527][00268] Num frames 10300...
[2024-09-18 11:43:56,649][00268] Num frames 10400...
[2024-09-18 11:43:56,774][00268] Num frames 10500...
[2024-09-18 11:43:56,907][00268] Num frames 10600...
[2024-09-18 11:43:57,039][00268] Num frames 10700...
[2024-09-18 11:43:57,178][00268] Num frames 10800...
[2024-09-18 11:43:57,308][00268] Num frames 10900...
[2024-09-18 11:43:57,430][00268] Num frames 11000...
[2024-09-18 11:43:57,554][00268] Num frames 11100...
[2024-09-18 11:43:57,673][00268] Num frames 11200...
[2024-09-18 11:43:57,791][00268] Num frames 11300...
[2024-09-18 11:43:57,919][00268] Num frames 11400...
[2024-09-18 11:43:58,047][00268] Num frames 11500...
[2024-09-18 11:43:58,174][00268] Num frames 11600...
[2024-09-18 11:43:58,294][00268] Num frames 11700...
[2024-09-18 11:43:58,360][00268] Avg episode rewards: #0: 28.408, true rewards: #0: 11.708
[2024-09-18 11:43:58,361][00268] Avg episode reward: 28.408, avg true_objective: 11.708
[2024-09-18 11:45:12,588][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-18 11:45:17,605][00268] The model has been pushed to https://huggingface.co/mkdem/rl_course_vizdoom_health_gathering_supreme
[2024-09-18 11:46:31,646][00268] Environment doom_basic already registered, overwriting...
[2024-09-18 11:46:31,649][00268] Environment doom_two_colors_easy already registered, overwriting...
[2024-09-18 11:46:31,651][00268] Environment doom_two_colors_hard already registered, overwriting...
[2024-09-18 11:46:31,654][00268] Environment doom_dm already registered, overwriting...
[2024-09-18 11:46:31,655][00268] Environment doom_dwango5 already registered, overwriting...
[2024-09-18 11:46:31,658][00268] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2024-09-18 11:46:31,659][00268] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2024-09-18 11:46:31,661][00268] Environment doom_my_way_home already registered, overwriting...
[2024-09-18 11:46:31,665][00268] Environment doom_deadly_corridor already registered, overwriting...
[2024-09-18 11:46:31,667][00268] Environment doom_defend_the_center already registered, overwriting...
[2024-09-18 11:46:31,669][00268] Environment doom_defend_the_line already registered, overwriting...
[2024-09-18 11:46:31,670][00268] Environment doom_health_gathering already registered, overwriting...
[2024-09-18 11:46:31,671][00268] Environment doom_health_gathering_supreme already registered, overwriting...
[2024-09-18 11:46:31,672][00268] Environment doom_battle already registered, overwriting...
[2024-09-18 11:46:31,673][00268] Environment doom_battle2 already registered, overwriting...
[2024-09-18 11:46:31,674][00268] Environment doom_duel_bots already registered, overwriting...
[2024-09-18 11:46:31,675][00268] Environment doom_deathmatch_bots already registered, overwriting...
[2024-09-18 11:46:31,676][00268] Environment doom_duel already registered, overwriting...
[2024-09-18 11:46:31,677][00268] Environment doom_deathmatch_full already registered, overwriting...
[2024-09-18 11:46:31,678][00268] Environment doom_benchmark already registered, overwriting...
[2024-09-18 11:46:31,679][00268] register_encoder_factory: <function make_vizdoom_encoder at 0x789910de0b80>
[2024-09-18 11:46:31,704][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 11:46:31,705][00268] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line
[2024-09-18 11:46:31,712][00268] Experiment dir /content/train_dir/default_experiment already exists!
[2024-09-18 11:46:31,713][00268] Resuming existing experiment from /content/train_dir/default_experiment...
[2024-09-18 11:46:31,714][00268] Weights and Biases integration disabled
[2024-09-18 11:46:31,718][00268] Environment var CUDA_VISIBLE_DEVICES is 0
[2024-09-18 11:46:33,229][00268] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=8000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2024-09-18 11:46:33,231][00268] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-18 11:46:33,236][00268] Rollout worker 0 uses device cpu
[2024-09-18 11:46:33,237][00268] Rollout worker 1 uses device cpu
[2024-09-18 11:46:33,239][00268] Rollout worker 2 uses device cpu
[2024-09-18 11:46:33,240][00268] Rollout worker 3 uses device cpu
[2024-09-18 11:46:33,241][00268] Rollout worker 4 uses device cpu
[2024-09-18 11:46:33,242][00268] Rollout worker 5 uses device cpu
[2024-09-18 11:46:33,243][00268] Rollout worker 6 uses device cpu
[2024-09-18 11:46:33,245][00268] Rollout worker 7 uses device cpu
[2024-09-18 11:46:33,353][00268] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:46:33,354][00268] InferenceWorker_p0-w0: min num requests: 2
[2024-09-18 11:46:33,389][00268] Starting all processes...
[2024-09-18 11:46:33,391][00268] Starting process learner_proc0
[2024-09-18 11:46:33,439][00268] Starting all processes...
[2024-09-18 11:46:33,445][00268] Starting process inference_proc0-0
[2024-09-18 11:46:33,446][00268] Starting process rollout_proc0
[2024-09-18 11:46:33,452][00268] Starting process rollout_proc1
[2024-09-18 11:46:33,454][00268] Starting process rollout_proc2
[2024-09-18 11:46:33,454][00268] Starting process rollout_proc3
[2024-09-18 11:46:33,454][00268] Starting process rollout_proc4
[2024-09-18 11:46:33,454][00268] Starting process rollout_proc5
[2024-09-18 11:46:33,455][00268] Starting process rollout_proc6
[2024-09-18 11:46:33,455][00268] Starting process rollout_proc7
[2024-09-18 11:46:44,326][11908] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:46:44,329][11908] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-18 11:46:44,403][11908] Num visible devices: 1
[2024-09-18 11:46:44,441][11908] Starting seed is not provided
[2024-09-18 11:46:44,442][11908] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:46:44,443][11908] Initializing actor-critic model on device cuda:0
[2024-09-18 11:46:44,444][11908] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:46:44,451][11908] RunningMeanStd input shape: (1,)
[2024-09-18 11:46:44,529][11908] ConvEncoder: input_channels=3
[2024-09-18 11:46:45,289][11922] Worker 0 uses CPU cores [0]
[2024-09-18 11:46:45,419][11925] Worker 3 uses CPU cores [1]
[2024-09-18 11:46:45,473][11921] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:46:45,479][11921] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-18 11:46:45,511][11908] Conv encoder output size: 512
[2024-09-18 11:46:45,513][11908] Policy head output size: 512
[2024-09-18 11:46:45,564][11921] Num visible devices: 1
[2024-09-18 11:46:45,602][11927] Worker 6 uses CPU cores [0]
[2024-09-18 11:46:45,604][11908] Created Actor Critic model with architecture:
[2024-09-18 11:46:45,610][11908] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-09-18 11:46:45,682][11928] Worker 5 uses CPU cores [1]
[2024-09-18 11:46:45,712][11924] Worker 2 uses CPU cores [0]
[2024-09-18 11:46:45,724][11929] Worker 7 uses CPU cores [1]
[2024-09-18 11:46:45,730][11926] Worker 4 uses CPU cores [0]
[2024-09-18 11:46:45,744][11923] Worker 1 uses CPU cores [1]
[2024-09-18 11:46:47,399][11908] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-18 11:46:47,400][11908] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-18 11:46:47,436][11908] Loading model from checkpoint
[2024-09-18 11:46:47,441][11908] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2024-09-18 11:46:47,441][11908] Initialized policy 0 weights for model version 978
[2024-09-18 11:46:47,444][11908] LearnerWorker_p0 finished initialization!
[2024-09-18 11:46:47,445][11908] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-18 11:46:47,637][11921] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 11:46:47,638][11921] RunningMeanStd input shape: (1,)
[2024-09-18 11:46:47,650][11921] ConvEncoder: input_channels=3
[2024-09-18 11:46:47,750][11921] Conv encoder output size: 512
[2024-09-18 11:46:47,750][11921] Policy head output size: 512
[2024-09-18 11:46:49,350][00268] Inference worker 0-0 is ready!
[2024-09-18 11:46:49,352][00268] All inference workers are ready! Signal rollout workers to start!
[2024-09-18 11:46:49,474][11929] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,474][11923] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,480][11928] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,482][11925] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,558][11922] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,586][11927] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,588][11926] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:49,619][11924] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-18 11:46:50,920][11927] Decorrelating experience for 0 frames...
[2024-09-18 11:46:50,922][11926] Decorrelating experience for 0 frames...
[2024-09-18 11:46:50,924][11922] Decorrelating experience for 0 frames...
[2024-09-18 11:46:50,919][11925] Decorrelating experience for 0 frames...
[2024-09-18 11:46:50,924][11928] Decorrelating experience for 0 frames...
[2024-09-18 11:46:50,928][11929] Decorrelating experience for 0 frames...
[2024-09-18 11:46:51,688][11928] Decorrelating experience for 32 frames...
[2024-09-18 11:46:51,698][11929] Decorrelating experience for 32 frames...
[2024-09-18 11:46:51,719][00268] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:46:52,174][11922] Decorrelating experience for 32 frames...
[2024-09-18 11:46:52,180][11927] Decorrelating experience for 32 frames...
[2024-09-18 11:46:52,185][11926] Decorrelating experience for 32 frames...
[2024-09-18 11:46:53,088][11923] Decorrelating experience for 0 frames...
[2024-09-18 11:46:53,222][11929] Decorrelating experience for 64 frames...
[2024-09-18 11:46:53,265][11924] Decorrelating experience for 0 frames...
[2024-09-18 11:46:53,344][00268] Heartbeat connected on Batcher_0
[2024-09-18 11:46:53,348][00268] Heartbeat connected on LearnerWorker_p0
[2024-09-18 11:46:53,385][00268] Heartbeat connected on InferenceWorker_p0-w0
[2024-09-18 11:46:53,804][11922] Decorrelating experience for 64 frames...
[2024-09-18 11:46:53,830][11926] Decorrelating experience for 64 frames...
[2024-09-18 11:46:53,853][11928] Decorrelating experience for 64 frames...
[2024-09-18 11:46:53,948][11925] Decorrelating experience for 32 frames...
[2024-09-18 11:46:54,935][11923] Decorrelating experience for 32 frames...
[2024-09-18 11:46:55,510][11924] Decorrelating experience for 32 frames...
[2024-09-18 11:46:55,552][11927] Decorrelating experience for 64 frames...
[2024-09-18 11:46:55,697][11926] Decorrelating experience for 96 frames...
[2024-09-18 11:46:56,057][00268] Heartbeat connected on RolloutWorker_w4
[2024-09-18 11:46:56,111][11929] Decorrelating experience for 96 frames...
[2024-09-18 11:46:56,709][00268] Heartbeat connected on RolloutWorker_w7
[2024-09-18 11:46:56,719][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:46:56,884][11925] Decorrelating experience for 64 frames...
[2024-09-18 11:46:57,644][11922] Decorrelating experience for 96 frames...
[2024-09-18 11:46:57,927][11927] Decorrelating experience for 96 frames...
[2024-09-18 11:46:58,075][00268] Heartbeat connected on RolloutWorker_w0
[2024-09-18 11:46:58,433][00268] Heartbeat connected on RolloutWorker_w6
[2024-09-18 11:46:58,565][11923] Decorrelating experience for 64 frames...
[2024-09-18 11:47:00,133][11925] Decorrelating experience for 96 frames...
[2024-09-18 11:47:00,378][00268] Heartbeat connected on RolloutWorker_w3
[2024-09-18 11:47:00,402][11928] Decorrelating experience for 96 frames...
[2024-09-18 11:47:01,119][00268] Heartbeat connected on RolloutWorker_w5
[2024-09-18 11:47:01,719][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 174.4. Samples: 1744. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-18 11:47:01,722][00268] Avg episode reward: [(0, '3.475')]
[2024-09-18 11:47:02,068][11924] Decorrelating experience for 64 frames...
[2024-09-18 11:47:02,479][11908] Signal inference workers to stop experience collection...
[2024-09-18 11:47:02,499][11921] InferenceWorker_p0-w0: stopping experience collection
[2024-09-18 11:47:02,837][11923] Decorrelating experience for 96 frames...
[2024-09-18 11:47:03,015][11924] Decorrelating experience for 96 frames...
[2024-09-18 11:47:03,044][00268] Heartbeat connected on RolloutWorker_w1
[2024-09-18 11:47:03,086][00268] Heartbeat connected on RolloutWorker_w2
[2024-09-18 11:47:04,497][11908] Signal inference workers to resume experience collection...
[2024-09-18 11:47:04,499][11921] InferenceWorker_p0-w0: resuming experience collection
[2024-09-18 11:47:06,719][00268] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4018176. Throughput: 0: 162.5. Samples: 2438. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-09-18 11:47:06,726][00268] Avg episode reward: [(0, '4.331')]
[2024-09-18 11:47:11,719][00268] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 4034560. Throughput: 0: 384.9. Samples: 7698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:47:11,727][00268] Avg episode reward: [(0, '10.969')]
[2024-09-18 11:47:15,667][11921] Updated weights for policy 0, policy_version 988 (0.0379)
[2024-09-18 11:47:16,722][00268] Fps is (10 sec: 2866.2, 60 sec: 1638.2, 300 sec: 1638.2). Total num frames: 4046848. Throughput: 0: 456.0. Samples: 11402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:47:16,725][00268] Avg episode reward: [(0, '15.098')]
[2024-09-18 11:47:21,719][00268] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4067328. Throughput: 0: 454.8. Samples: 13644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:47:21,724][00268] Avg episode reward: [(0, '17.551')]
[2024-09-18 11:47:26,089][11921] Updated weights for policy 0, policy_version 998 (0.0023)
[2024-09-18 11:47:26,719][00268] Fps is (10 sec: 4097.5, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 4087808. Throughput: 0: 576.7. Samples: 20184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:47:26,726][00268] Avg episode reward: [(0, '18.977')]
[2024-09-18 11:47:31,724][00268] Fps is (10 sec: 3684.5, 60 sec: 2457.3, 300 sec: 2457.3). Total num frames: 4104192. Throughput: 0: 642.8. Samples: 25716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:47:31,727][00268] Avg episode reward: [(0, '19.347')]
[2024-09-18 11:47:36,719][00268] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 4120576. Throughput: 0: 617.2. Samples: 27776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:47:36,721][00268] Avg episode reward: [(0, '21.375')]
[2024-09-18 11:47:38,323][11921] Updated weights for policy 0, policy_version 1008 (0.0022)
[2024-09-18 11:47:41,719][00268] Fps is (10 sec: 3688.2, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 4141056. Throughput: 0: 745.3. Samples: 33540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:47:41,721][00268] Avg episode reward: [(0, '23.461')]
[2024-09-18 11:47:46,720][00268] Fps is (10 sec: 4095.4, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 4161536. Throughput: 0: 848.6. Samples: 39930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:47:46,723][00268] Avg episode reward: [(0, '23.648')]
[2024-09-18 11:47:48,762][11921] Updated weights for policy 0, policy_version 1018 (0.0025)
[2024-09-18 11:47:51,719][00268] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 4177920. Throughput: 0: 877.6. Samples: 41928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:47:51,723][00268] Avg episode reward: [(0, '23.841')]
[2024-09-18 11:47:56,719][00268] Fps is (10 sec: 3277.3, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 4194304. Throughput: 0: 868.5. Samples: 46782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:47:56,726][00268] Avg episode reward: [(0, '23.940')]
[2024-09-18 11:47:59,826][11921] Updated weights for policy 0, policy_version 1028 (0.0017)
[2024-09-18 11:48:01,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 4214784. Throughput: 0: 933.2. Samples: 53394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:48:01,725][00268] Avg episode reward: [(0, '25.074')]
[2024-09-18 11:48:06,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3003.7). Total num frames: 4231168. Throughput: 0: 948.6. Samples: 56332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:48:06,724][00268] Avg episode reward: [(0, '25.138')]
[2024-09-18 11:48:11,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3020.8). Total num frames: 4247552. Throughput: 0: 892.9. Samples: 60364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:48:11,721][00268] Avg episode reward: [(0, '24.553')]
[2024-09-18 11:48:12,035][11921] Updated weights for policy 0, policy_version 1038 (0.0027)
[2024-09-18 11:48:16,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3132.2). Total num frames: 4272128. Throughput: 0: 911.2. Samples: 66716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:48:16,721][00268] Avg episode reward: [(0, '26.331')]
[2024-09-18 11:48:21,647][11921] Updated weights for policy 0, policy_version 1048 (0.0012)
[2024-09-18 11:48:21,720][00268] Fps is (10 sec: 4505.3, 60 sec: 3754.6, 300 sec: 3185.7). Total num frames: 4292608. Throughput: 0: 938.4. Samples: 70004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:48:21,726][00268] Avg episode reward: [(0, '25.587')]
[2024-09-18 11:48:26,721][00268] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3147.4). Total num frames: 4304896. Throughput: 0: 913.6. Samples: 74652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:48:26,727][00268] Avg episode reward: [(0, '24.711')]
[2024-09-18 11:48:31,719][00268] Fps is (10 sec: 3276.9, 60 sec: 3686.7, 300 sec: 3194.9). Total num frames: 4325376. Throughput: 0: 894.0. Samples: 80160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:48:31,723][00268] Avg episode reward: [(0, '25.028')]
[2024-09-18 11:48:31,734][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth...
[2024-09-18 11:48:31,889][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth
[2024-09-18 11:48:33,602][11921] Updated weights for policy 0, policy_version 1058 (0.0019)
[2024-09-18 11:48:36,719][00268] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3237.8). Total num frames: 4345856. Throughput: 0: 918.5. Samples: 83262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:48:36,721][00268] Avg episode reward: [(0, '25.041')]
[2024-09-18 11:48:41,724][00268] Fps is (10 sec: 3684.7, 60 sec: 3686.1, 300 sec: 3239.4). Total num frames: 4362240. Throughput: 0: 932.8. Samples: 88762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:48:41,736][00268] Avg episode reward: [(0, '25.959')]
[2024-09-18 11:48:45,767][11921] Updated weights for policy 0, policy_version 1068 (0.0017)
[2024-09-18 11:48:46,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3205.6). Total num frames: 4374528. Throughput: 0: 887.6. Samples: 93336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:48:46,722][00268] Avg episode reward: [(0, '25.444')]
[2024-09-18 11:48:51,719][00268] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 4399104. Throughput: 0: 895.6. Samples: 96632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 11:48:51,726][00268] Avg episode reward: [(0, '25.901')]
[2024-09-18 11:48:55,463][11921] Updated weights for policy 0, policy_version 1078 (0.0018)
[2024-09-18 11:48:56,720][00268] Fps is (10 sec: 4095.4, 60 sec: 3686.3, 300 sec: 3276.8). Total num frames: 4415488. Throughput: 0: 948.5. Samples: 103048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 11:48:56,727][00268] Avg episode reward: [(0, '26.604')]
[2024-09-18 11:49:01,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 4431872. Throughput: 0: 898.6. Samples: 107154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:49:01,724][00268] Avg episode reward: [(0, '28.095')]
[2024-09-18 11:49:01,733][11908] Saving new best policy, reward=28.095!
[2024-09-18 11:49:06,719][00268] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3307.1). Total num frames: 4452352. Throughput: 0: 886.7. Samples: 109904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:49:06,722][00268] Avg episode reward: [(0, '28.559')]
[2024-09-18 11:49:06,725][11908] Saving new best policy, reward=28.559!
[2024-09-18 11:49:07,583][11921] Updated weights for policy 0, policy_version 1088 (0.0025)
[2024-09-18 11:49:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3335.3). Total num frames: 4472832. Throughput: 0: 926.6. Samples: 116348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:49:11,721][00268] Avg episode reward: [(0, '26.669')]
[2024-09-18 11:49:16,722][00268] Fps is (10 sec: 3685.1, 60 sec: 3617.9, 300 sec: 3333.2). Total num frames: 4489216. Throughput: 0: 913.1. Samples: 121250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:49:16,725][00268] Avg episode reward: [(0, '25.952')]
[2024-09-18 11:49:19,341][11921] Updated weights for policy 0, policy_version 1098 (0.0032)
[2024-09-18 11:49:21,720][00268] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3331.4). Total num frames: 4505600. Throughput: 0: 888.6. Samples: 123250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:49:21,723][00268] Avg episode reward: [(0, '24.703')]
[2024-09-18 11:49:26,719][00268] Fps is (10 sec: 3687.7, 60 sec: 3686.5, 300 sec: 3356.1). Total num frames: 4526080. Throughput: 0: 910.1. Samples: 129712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:49:26,721][00268] Avg episode reward: [(0, '23.417')]
[2024-09-18 11:49:28,880][11921] Updated weights for policy 0, policy_version 1108 (0.0012)
[2024-09-18 11:49:31,719][00268] Fps is (10 sec: 4096.6, 60 sec: 3686.4, 300 sec: 3379.2). Total num frames: 4546560. Throughput: 0: 940.9. Samples: 135676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:49:31,723][00268] Avg episode reward: [(0, '20.791')]
[2024-09-18 11:49:36,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3351.3). Total num frames: 4558848. Throughput: 0: 912.2. Samples: 137682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:49:36,722][00268] Avg episode reward: [(0, '19.184')]
[2024-09-18 11:49:40,969][11921] Updated weights for policy 0, policy_version 1118 (0.0012)
[2024-09-18 11:49:41,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3373.2). Total num frames: 4579328. Throughput: 0: 893.0. Samples: 143230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:49:41,728][00268] Avg episode reward: [(0, '19.430')]
[2024-09-18 11:49:46,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3417.2). Total num frames: 4603904. Throughput: 0: 949.7. Samples: 149890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:49:46,725][00268] Avg episode reward: [(0, '20.006')]
[2024-09-18 11:49:51,725][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3390.6). Total num frames: 4616192. Throughput: 0: 936.0. Samples: 152022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:49:51,728][00268] Avg episode reward: [(0, '21.042')]
[2024-09-18 11:49:52,336][11921] Updated weights for policy 0, policy_version 1128 (0.0045)
[2024-09-18 11:49:56,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3387.5). Total num frames: 4632576. Throughput: 0: 894.8. Samples: 156612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:49:56,724][00268] Avg episode reward: [(0, '22.710')]
[2024-09-18 11:50:01,719][00268] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3427.7). Total num frames: 4657152. Throughput: 0: 933.9. Samples: 163274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:50:01,726][00268] Avg episode reward: [(0, '25.060')]
[2024-09-18 11:50:02,465][11921] Updated weights for policy 0, policy_version 1138 (0.0013)
[2024-09-18 11:50:06,719][00268] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3423.8). Total num frames: 4673536. Throughput: 0: 958.7. Samples: 166392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:50:06,725][00268] Avg episode reward: [(0, '26.301')]
[2024-09-18 11:50:11,719][00268] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3399.7). Total num frames: 4685824. Throughput: 0: 905.6. Samples: 170462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:50:11,721][00268] Avg episode reward: [(0, '27.010')]
[2024-09-18 11:50:14,728][11921] Updated weights for policy 0, policy_version 1148 (0.0027)
[2024-09-18 11:50:16,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3436.6). Total num frames: 4710400. Throughput: 0: 907.2. Samples: 176498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:50:16,723][00268] Avg episode reward: [(0, '28.325')]
[2024-09-18 11:50:21,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3452.3). Total num frames: 4730880. Throughput: 0: 931.6. Samples: 179602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:50:21,723][00268] Avg episode reward: [(0, '28.817')]
[2024-09-18 11:50:21,736][11908] Saving new best policy, reward=28.817!
[2024-09-18 11:50:26,219][11921] Updated weights for policy 0, policy_version 1158 (0.0019)
[2024-09-18 11:50:26,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3429.2). Total num frames: 4743168. Throughput: 0: 913.9. Samples: 184356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:50:26,722][00268] Avg episode reward: [(0, '29.033')]
[2024-09-18 11:50:26,727][11908] Saving new best policy, reward=29.033!
[2024-09-18 11:50:31,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3425.7). Total num frames: 4759552. Throughput: 0: 877.0. Samples: 189354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:50:31,722][00268] Avg episode reward: [(0, '30.088')]
[2024-09-18 11:50:31,730][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001162_4759552.pth...
[2024-09-18 11:50:31,901][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
[2024-09-18 11:50:31,919][11908] Saving new best policy, reward=30.088!
[2024-09-18 11:50:36,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3440.6). Total num frames: 4780032. Throughput: 0: 896.5. Samples: 192364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:50:36,722][00268] Avg episode reward: [(0, '29.569')]
[2024-09-18 11:50:37,058][11921] Updated weights for policy 0, policy_version 1168 (0.0030)
[2024-09-18 11:50:41,725][00268] Fps is (10 sec: 3684.1, 60 sec: 3617.7, 300 sec: 3437.0). Total num frames: 4796416. Throughput: 0: 922.0. Samples: 198108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:50:41,732][00268] Avg episode reward: [(0, '29.314')]
[2024-09-18 11:50:46,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3433.7). Total num frames: 4812800. Throughput: 0: 865.7. Samples: 202230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:50:46,722][00268] Avg episode reward: [(0, '27.605')]
[2024-09-18 11:50:49,498][11921] Updated weights for policy 0, policy_version 1178 (0.0014)
[2024-09-18 11:50:51,719][00268] Fps is (10 sec: 3688.8, 60 sec: 3618.1, 300 sec: 3447.5). Total num frames: 4833280. Throughput: 0: 863.9. Samples: 205266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:50:51,724][00268] Avg episode reward: [(0, '27.196')]
[2024-09-18 11:50:56,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3460.7). Total num frames: 4853760. Throughput: 0: 920.2. Samples: 211870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:50:56,725][00268] Avg episode reward: [(0, '24.836')]
[2024-09-18 11:51:00,346][11921] Updated weights for policy 0, policy_version 1188 (0.0015)
[2024-09-18 11:51:01,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3440.6). Total num frames: 4866048. Throughput: 0: 882.3. Samples: 216200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:51:01,721][00268] Avg episode reward: [(0, '23.741')]
[2024-09-18 11:51:06,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3453.5). Total num frames: 4886528. Throughput: 0: 865.7. Samples: 218560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:51:06,725][00268] Avg episode reward: [(0, '22.872')]
[2024-09-18 11:51:11,084][11921] Updated weights for policy 0, policy_version 1198 (0.0021)
[2024-09-18 11:51:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3465.8). Total num frames: 4907008. Throughput: 0: 906.5. Samples: 225150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:51:11,725][00268] Avg episode reward: [(0, '24.063')]
[2024-09-18 11:51:16,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3462.3). Total num frames: 4923392. Throughput: 0: 913.6. Samples: 230464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:51:16,723][00268] Avg episode reward: [(0, '22.917')]
[2024-09-18 11:51:21,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3458.8). Total num frames: 4939776. Throughput: 0: 890.2. Samples: 232424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:51:21,723][00268] Avg episode reward: [(0, '23.143')]
[2024-09-18 11:51:23,297][11921] Updated weights for policy 0, policy_version 1208 (0.0027)
[2024-09-18 11:51:26,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3470.4). Total num frames: 4960256. Throughput: 0: 895.2. Samples: 238386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:51:26,726][00268] Avg episode reward: [(0, '24.554')]
[2024-09-18 11:51:31,720][00268] Fps is (10 sec: 4095.4, 60 sec: 3686.3, 300 sec: 3481.6). Total num frames: 4980736. Throughput: 0: 946.1. Samples: 244806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:51:31,725][00268] Avg episode reward: [(0, '27.471')]
[2024-09-18 11:51:33,757][11921] Updated weights for policy 0, policy_version 1218 (0.0015)
[2024-09-18 11:51:36,726][00268] Fps is (10 sec: 3683.7, 60 sec: 3617.7, 300 sec: 3477.9). Total num frames: 4997120. Throughput: 0: 920.9. Samples: 246714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:51:36,729][00268] Avg episode reward: [(0, '28.113')]
[2024-09-18 11:51:41,719][00268] Fps is (10 sec: 3277.2, 60 sec: 3618.5, 300 sec: 3474.5). Total num frames: 5013504. Throughput: 0: 888.7. Samples: 251860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:51:41,730][00268] Avg episode reward: [(0, '26.407')]
[2024-09-18 11:51:44,646][11921] Updated weights for policy 0, policy_version 1228 (0.0017)
[2024-09-18 11:51:46,719][00268] Fps is (10 sec: 4099.0, 60 sec: 3754.7, 300 sec: 3499.0). Total num frames: 5038080. Throughput: 0: 939.8. Samples: 258490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:51:46,721][00268] Avg episode reward: [(0, '28.514')]
[2024-09-18 11:51:51,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 5050368. Throughput: 0: 944.6. Samples: 261068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:51:51,722][00268] Avg episode reward: [(0, '29.487')]
[2024-09-18 11:51:56,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 5066752. Throughput: 0: 888.5. Samples: 265134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:51:56,723][00268] Avg episode reward: [(0, '30.170')]
[2024-09-18 11:51:56,727][11908] Saving new best policy, reward=30.170!
[2024-09-18 11:51:57,077][11921] Updated weights for policy 0, policy_version 1238 (0.0026)
[2024-09-18 11:52:01,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 5087232. Throughput: 0: 914.9. Samples: 271634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:52:01,721][00268] Avg episode reward: [(0, '28.542')]
[2024-09-18 11:52:06,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 5107712. Throughput: 0: 944.9. Samples: 274946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:06,728][00268] Avg episode reward: [(0, '28.131')]
[2024-09-18 11:52:06,902][11921] Updated weights for policy 0, policy_version 1248 (0.0015)
[2024-09-18 11:52:11,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 5124096. Throughput: 0: 911.0. Samples: 279380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:11,722][00268] Avg episode reward: [(0, '28.898')]
[2024-09-18 11:52:16,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5144576. Throughput: 0: 894.2. Samples: 285044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:16,722][00268] Avg episode reward: [(0, '29.064')]
[2024-09-18 11:52:18,514][11921] Updated weights for policy 0, policy_version 1258 (0.0019)
[2024-09-18 11:52:21,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 5165056. Throughput: 0: 920.4. Samples: 288126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:21,723][00268] Avg episode reward: [(0, '27.645')]
[2024-09-18 11:52:26,719][00268] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5181440. Throughput: 0: 927.0. Samples: 293576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:52:26,728][00268] Avg episode reward: [(0, '26.540')]
[2024-09-18 11:52:30,732][11921] Updated weights for policy 0, policy_version 1268 (0.0016)
[2024-09-18 11:52:31,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 5197824. Throughput: 0: 886.1. Samples: 298366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:31,725][00268] Avg episode reward: [(0, '28.213')]
[2024-09-18 11:52:31,734][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001269_5197824.pth...
[2024-09-18 11:52:31,936][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth
[2024-09-18 11:52:36,719][00268] Fps is (10 sec: 3686.5, 60 sec: 3686.9, 300 sec: 3651.7). Total num frames: 5218304. Throughput: 0: 899.4. Samples: 301542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:36,721][00268] Avg episode reward: [(0, '27.068')]
[2024-09-18 11:52:39,984][11921] Updated weights for policy 0, policy_version 1278 (0.0013)
[2024-09-18 11:52:41,720][00268] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 5238784. Throughput: 0: 951.7. Samples: 307960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:52:41,727][00268] Avg episode reward: [(0, '25.255')]
[2024-09-18 11:52:46,719][00268] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 5251072. Throughput: 0: 899.1. Samples: 312092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:46,722][00268] Avg episode reward: [(0, '25.490')]
[2024-09-18 11:52:51,719][00268] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5271552. Throughput: 0: 892.8. Samples: 315120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:51,722][00268] Avg episode reward: [(0, '26.781')]
[2024-09-18 11:52:51,930][11921] Updated weights for policy 0, policy_version 1288 (0.0012)
[2024-09-18 11:52:56,719][00268] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 5296128. Throughput: 0: 946.5. Samples: 321974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:52:56,722][00268] Avg episode reward: [(0, '26.026')]
[2024-09-18 11:53:01,720][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5308416. Throughput: 0: 923.8. Samples: 326616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:53:01,724][00268] Avg episode reward: [(0, '26.079')]
[2024-09-18 11:53:03,766][11921] Updated weights for policy 0, policy_version 1298 (0.0017)
[2024-09-18 11:53:06,719][00268] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 5328896. Throughput: 0: 902.8. Samples: 328752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:53:06,722][00268] Avg episode reward: [(0, '26.119')]
[2024-09-18 11:53:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 5349376. Throughput: 0: 928.9. Samples: 335374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:11,728][00268] Avg episode reward: [(0, '28.086')]
[2024-09-18 11:53:13,350][11921] Updated weights for policy 0, policy_version 1308 (0.0022)
[2024-09-18 11:53:16,719][00268] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 5365760. Throughput: 0: 945.7. Samples: 340924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:53:16,721][00268] Avg episode reward: [(0, '28.089')]
[2024-09-18 11:53:21,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 5382144. Throughput: 0: 918.9. Samples: 342894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:21,721][00268] Avg episode reward: [(0, '26.776')]
[2024-09-18 11:53:25,449][11921] Updated weights for policy 0, policy_version 1318 (0.0013)
[2024-09-18 11:53:26,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5402624. Throughput: 0: 905.4. Samples: 348704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:53:26,727][00268] Avg episode reward: [(0, '27.871')]
[2024-09-18 11:53:31,719][00268] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 5423104. Throughput: 0: 960.0. Samples: 355292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:31,724][00268] Avg episode reward: [(0, '28.868')]
[2024-09-18 11:53:36,644][11921] Updated weights for policy 0, policy_version 1328 (0.0041)
[2024-09-18 11:53:36,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.8). Total num frames: 5439488. Throughput: 0: 937.7. Samples: 357318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:53:36,720][00268] Avg episode reward: [(0, '28.193')]
[2024-09-18 11:53:41,719][00268] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3665.6). Total num frames: 5455872. Throughput: 0: 892.0. Samples: 362116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:41,722][00268] Avg episode reward: [(0, '29.277')]
[2024-09-18 11:53:46,586][11921] Updated weights for policy 0, policy_version 1338 (0.0013)
[2024-09-18 11:53:46,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 5480448. Throughput: 0: 938.9. Samples: 368866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:53:46,723][00268] Avg episode reward: [(0, '29.732')]
[2024-09-18 11:53:51,719][00268] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 5496832. Throughput: 0: 956.8. Samples: 371808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:51,721][00268] Avg episode reward: [(0, '29.132')]
[2024-09-18 11:53:56,719][00268] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 5509120. Throughput: 0: 900.1. Samples: 375880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:53:56,721][00268] Avg episode reward: [(0, '29.381')]
[2024-09-18 11:53:58,845][11921] Updated weights for policy 0, policy_version 1348 (0.0026)
[2024-09-18 11:54:01,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 5533696. Throughput: 0: 916.1. Samples: 382150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:54:01,723][00268] Avg episode reward: [(0, '28.556')]
[2024-09-18 11:54:06,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 5554176. Throughput: 0: 947.1. Samples: 385514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:54:06,726][00268] Avg episode reward: [(0, '27.738')]
[2024-09-18 11:54:09,442][11921] Updated weights for policy 0, policy_version 1358 (0.0016)
[2024-09-18 11:54:11,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 5566464. Throughput: 0: 926.4. Samples: 390394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:54:11,729][00268] Avg episode reward: [(0, '27.041')]
[2024-09-18 11:54:16,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 5586944. Throughput: 0: 897.7. Samples: 395690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:54:16,723][00268] Avg episode reward: [(0, '25.506')]
[2024-09-18 11:54:20,289][11921] Updated weights for policy 0, policy_version 1368 (0.0022)
[2024-09-18 11:54:21,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 5607424. Throughput: 0: 924.0. Samples: 398898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:54:21,721][00268] Avg episode reward: [(0, '27.194')]
[2024-09-18 11:54:26,721][00268] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 5623808. Throughput: 0: 942.5. Samples: 404532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:54:26,728][00268] Avg episode reward: [(0, '26.821')]
[2024-09-18 11:54:31,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 5640192. Throughput: 0: 889.3. Samples: 408884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:54:31,723][00268] Avg episode reward: [(0, '26.285')]
[2024-09-18 11:54:31,732][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001377_5640192.pth...
[2024-09-18 11:54:31,927][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001162_4759552.pth
[2024-09-18 11:54:32,600][11921] Updated weights for policy 0, policy_version 1378 (0.0014)
[2024-09-18 11:54:36,719][00268] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 5660672. Throughput: 0: 894.4. Samples: 412058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:54:36,724][00268] Avg episode reward: [(0, '27.335')]
[2024-09-18 11:54:41,719][00268] Fps is (10 sec: 4095.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 5681152. Throughput: 0: 950.2. Samples: 418638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:54:41,722][00268] Avg episode reward: [(0, '27.669')]
[2024-09-18 11:54:42,427][11921] Updated weights for policy 0, policy_version 1388 (0.0013)
[2024-09-18 11:54:46,724][00268] Fps is (10 sec: 3275.0, 60 sec: 3549.5, 300 sec: 3651.6). Total num frames: 5693440. Throughput: 0: 905.1. Samples: 422884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-18 11:54:46,727][00268] Avg episode reward: [(0, '27.314')]
[2024-09-18 11:54:51,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 5713920. Throughput: 0: 887.5. Samples: 425454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:54:51,723][00268] Avg episode reward: [(0, '28.240')]
[2024-09-18 11:54:54,215][11921] Updated weights for policy 0, policy_version 1398 (0.0021)
[2024-09-18 11:54:56,719][00268] Fps is (10 sec: 4098.2, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 5734400. Throughput: 0: 922.3. Samples: 431896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:54:56,721][00268] Avg episode reward: [(0, '28.285')]
[2024-09-18 11:55:01,719][00268] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 5750784. Throughput: 0: 918.4. Samples: 437020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:55:01,724][00268] Avg episode reward: [(0, '29.069')]
[2024-09-18 11:55:06,379][11921] Updated weights for policy 0, policy_version 1408 (0.0020)
[2024-09-18 11:55:06,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5767168. Throughput: 0: 893.3. Samples: 439096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:55:06,721][00268] Avg episode reward: [(0, '28.113')]
[2024-09-18 11:55:11,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5787648. Throughput: 0: 903.2. Samples: 445172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:55:11,722][00268] Avg episode reward: [(0, '27.883')]
[2024-09-18 11:55:15,716][11921] Updated weights for policy 0, policy_version 1418 (0.0023)
[2024-09-18 11:55:16,719][00268] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 5808128. Throughput: 0: 946.0. Samples: 451454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:55:16,721][00268] Avg episode reward: [(0, '28.492')]
[2024-09-18 11:55:21,721][00268] Fps is (10 sec: 3276.0, 60 sec: 3549.7, 300 sec: 3651.7). Total num frames: 5820416. Throughput: 0: 918.5. Samples: 453392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:55:21,726][00268] Avg episode reward: [(0, '28.751')]
[2024-09-18 11:55:26,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3665.6). Total num frames: 5840896. Throughput: 0: 882.8. Samples: 458364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:55:26,726][00268] Avg episode reward: [(0, '27.881')]
[2024-09-18 11:55:28,081][11921] Updated weights for policy 0, policy_version 1428 (0.0019)
[2024-09-18 11:55:31,719][00268] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 5861376. Throughput: 0: 935.0. Samples: 464952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:55:31,721][00268] Avg episode reward: [(0, '26.046')]
[2024-09-18 11:55:36,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.7). Total num frames: 5877760. Throughput: 0: 935.1. Samples: 467534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:55:36,730][00268] Avg episode reward: [(0, '26.107')]
[2024-09-18 11:55:40,230][11921] Updated weights for policy 0, policy_version 1438 (0.0029)
[2024-09-18 11:55:41,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5894144. Throughput: 0: 885.5. Samples: 471744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:55:41,721][00268] Avg episode reward: [(0, '25.899')]
[2024-09-18 11:55:46,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3755.0, 300 sec: 3679.5). Total num frames: 5918720. Throughput: 0: 920.4. Samples: 478436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:55:46,722][00268] Avg episode reward: [(0, '24.725')]
[2024-09-18 11:55:49,442][11921] Updated weights for policy 0, policy_version 1448 (0.0020)
[2024-09-18 11:55:51,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 5935104. Throughput: 0: 948.1. Samples: 481760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:55:51,721][00268] Avg episode reward: [(0, '24.213')]
[2024-09-18 11:55:56,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5947392. Throughput: 0: 906.2. Samples: 485950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:55:56,721][00268] Avg episode reward: [(0, '24.600')]
[2024-09-18 11:56:01,540][11921] Updated weights for policy 0, policy_version 1458 (0.0013)
[2024-09-18 11:56:01,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 5971968. Throughput: 0: 897.2. Samples: 491830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:56:01,725][00268] Avg episode reward: [(0, '25.053')]
[2024-09-18 11:56:06,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 5992448. Throughput: 0: 925.3. Samples: 495030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:56:06,721][00268] Avg episode reward: [(0, '26.158')]
[2024-09-18 11:56:11,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6008832. Throughput: 0: 932.2. Samples: 500312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:56:11,722][00268] Avg episode reward: [(0, '25.424')]
[2024-09-18 11:56:13,036][11921] Updated weights for policy 0, policy_version 1468 (0.0018)
[2024-09-18 11:56:16,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6025216. Throughput: 0: 894.4. Samples: 505198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:56:16,725][00268] Avg episode reward: [(0, '25.783')]
[2024-09-18 11:56:21,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 6045696. Throughput: 0: 909.6. Samples: 508468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:56:21,722][00268] Avg episode reward: [(0, '25.508')]
[2024-09-18 11:56:23,290][11921] Updated weights for policy 0, policy_version 1478 (0.0012)
[2024-09-18 11:56:26,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 6062080. Throughput: 0: 952.0. Samples: 514586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:56:26,722][00268] Avg episode reward: [(0, '25.471')]
[2024-09-18 11:56:31,723][00268] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3665.6). Total num frames: 6078464. Throughput: 0: 893.6. Samples: 518654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:56:31,726][00268] Avg episode reward: [(0, '25.458')]
[2024-09-18 11:56:31,738][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001484_6078464.pth...
[2024-09-18 11:56:31,944][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001269_5197824.pth
[2024-09-18 11:56:35,487][11921] Updated weights for policy 0, policy_version 1488 (0.0025)
[2024-09-18 11:56:36,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6098944. Throughput: 0: 886.8. Samples: 521664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:56:36,726][00268] Avg episode reward: [(0, '25.154')]
[2024-09-18 11:56:41,719][00268] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 6119424. Throughput: 0: 940.1. Samples: 528254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:56:41,727][00268] Avg episode reward: [(0, '25.652')]
[2024-09-18 11:56:46,432][11921] Updated weights for policy 0, policy_version 1498 (0.0013)
[2024-09-18 11:56:46,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6135808. Throughput: 0: 914.3. Samples: 532974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:56:46,725][00268] Avg episode reward: [(0, '25.958')]
[2024-09-18 11:56:51,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6152192. Throughput: 0: 892.0. Samples: 535170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:56:51,721][00268] Avg episode reward: [(0, '25.891')]
[2024-09-18 11:56:56,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6172672. Throughput: 0: 919.7. Samples: 541700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:56:56,721][00268] Avg episode reward: [(0, '25.102')]
[2024-09-18 11:56:56,814][11921] Updated weights for policy 0, policy_version 1508 (0.0013)
[2024-09-18 11:57:01,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6193152. Throughput: 0: 934.8. Samples: 547264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:57:01,726][00268] Avg episode reward: [(0, '24.813')]
[2024-09-18 11:57:06,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 6205440. Throughput: 0: 907.3. Samples: 549298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:57:06,723][00268] Avg episode reward: [(0, '24.918')]
[2024-09-18 11:57:08,828][11921] Updated weights for policy 0, policy_version 1518 (0.0014)
[2024-09-18 11:57:11,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6230016. Throughput: 0: 903.0. Samples: 555220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:57:11,723][00268] Avg episode reward: [(0, '25.126')]
[2024-09-18 11:57:16,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6250496. Throughput: 0: 957.8. Samples: 561750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:57:16,724][00268] Avg episode reward: [(0, '25.219')]
[2024-09-18 11:57:19,548][11921] Updated weights for policy 0, policy_version 1528 (0.0018)
[2024-09-18 11:57:21,719][00268] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 6262784. Throughput: 0: 936.1. Samples: 563790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:57:21,722][00268] Avg episode reward: [(0, '26.143')]
[2024-09-18 11:57:26,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6283264. Throughput: 0: 896.6. Samples: 568602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:57:26,721][00268] Avg episode reward: [(0, '25.709')]
[2024-09-18 11:57:30,341][11921] Updated weights for policy 0, policy_version 1538 (0.0018)
[2024-09-18 11:57:31,719][00268] Fps is (10 sec: 4096.2, 60 sec: 3754.9, 300 sec: 3679.5). Total num frames: 6303744. Throughput: 0: 939.8. Samples: 575264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:57:31,724][00268] Avg episode reward: [(0, '27.262')]
[2024-09-18 11:57:36,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 6320128. Throughput: 0: 955.5. Samples: 578168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 11:57:36,721][00268] Avg episode reward: [(0, '27.389')]
[2024-09-18 11:57:41,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6336512. Throughput: 0: 900.4. Samples: 582218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:57:41,726][00268] Avg episode reward: [(0, '29.126')]
[2024-09-18 11:57:42,303][11921] Updated weights for policy 0, policy_version 1548 (0.0015)
[2024-09-18 11:57:46,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6356992. Throughput: 0: 922.9. Samples: 588794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:57:46,721][00268] Avg episode reward: [(0, '28.139')]
[2024-09-18 11:57:51,722][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 6377472. Throughput: 0: 949.3. Samples: 592016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 11:57:51,725][00268] Avg episode reward: [(0, '27.480')]
[2024-09-18 11:57:52,203][11921] Updated weights for policy 0, policy_version 1558 (0.0017)
[2024-09-18 11:57:56,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6393856. Throughput: 0: 918.1. Samples: 596536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:57:56,723][00268] Avg episode reward: [(0, '27.492')]
[2024-09-18 11:58:01,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 6410240. Throughput: 0: 895.1. Samples: 602028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:01,722][00268] Avg episode reward: [(0, '28.131')]
[2024-09-18 11:58:03,804][11921] Updated weights for policy 0, policy_version 1568 (0.0017)
[2024-09-18 11:58:06,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 6434816. Throughput: 0: 923.4. Samples: 605344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 11:58:06,725][00268] Avg episode reward: [(0, '27.295')]
[2024-09-18 11:58:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6451200. Throughput: 0: 941.2. Samples: 610956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:11,726][00268] Avg episode reward: [(0, '26.648')]
[2024-09-18 11:58:15,727][11921] Updated weights for policy 0, policy_version 1578 (0.0027)
[2024-09-18 11:58:16,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6467584. Throughput: 0: 897.3. Samples: 615644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:16,726][00268] Avg episode reward: [(0, '27.698')]
[2024-09-18 11:58:21,719][00268] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6488064. Throughput: 0: 905.5. Samples: 618916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:58:21,721][00268] Avg episode reward: [(0, '27.417')]
[2024-09-18 11:58:25,116][11921] Updated weights for policy 0, policy_version 1588 (0.0015)
[2024-09-18 11:58:26,720][00268] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 6508544. Throughput: 0: 959.3. Samples: 625386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:58:26,726][00268] Avg episode reward: [(0, '27.626')]
[2024-09-18 11:58:31,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 6520832. Throughput: 0: 903.8. Samples: 629466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:31,723][00268] Avg episode reward: [(0, '26.817')]
[2024-09-18 11:58:31,742][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001592_6520832.pth...
[2024-09-18 11:58:32,021][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001377_5640192.pth
[2024-09-18 11:58:36,719][00268] Fps is (10 sec: 3277.3, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6541312. Throughput: 0: 894.2. Samples: 632254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:58:36,726][00268] Avg episode reward: [(0, '26.392')]
[2024-09-18 11:58:37,298][11921] Updated weights for policy 0, policy_version 1598 (0.0017)
[2024-09-18 11:58:41,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 6561792. Throughput: 0: 940.7. Samples: 638868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:41,721][00268] Avg episode reward: [(0, '26.572')]
[2024-09-18 11:58:46,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 6578176. Throughput: 0: 927.4. Samples: 643762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:58:46,724][00268] Avg episode reward: [(0, '25.712')]
[2024-09-18 11:58:49,050][11921] Updated weights for policy 0, policy_version 1608 (0.0043)
[2024-09-18 11:58:51,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 6594560. Throughput: 0: 900.4. Samples: 645864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:58:51,723][00268] Avg episode reward: [(0, '25.829')]
[2024-09-18 11:58:56,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 6615040. Throughput: 0: 913.0. Samples: 652040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:58:56,721][00268] Avg episode reward: [(0, '26.750')]
[2024-09-18 11:58:58,901][11921] Updated weights for policy 0, policy_version 1618 (0.0013)
[2024-09-18 11:59:01,720][00268] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 6635520. Throughput: 0: 939.3. Samples: 657912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:59:01,723][00268] Avg episode reward: [(0, '26.893')]
[2024-09-18 11:59:06,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 6647808. Throughput: 0: 912.0. Samples: 659958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:59:06,721][00268] Avg episode reward: [(0, '26.881')]
[2024-09-18 11:59:11,097][11921] Updated weights for policy 0, policy_version 1628 (0.0042)
[2024-09-18 11:59:11,719][00268] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 6668288. Throughput: 0: 888.7. Samples: 665378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:59:11,721][00268] Avg episode reward: [(0, '26.499')]
[2024-09-18 11:59:16,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6692864. Throughput: 0: 945.5. Samples: 672012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:59:16,724][00268] Avg episode reward: [(0, '26.382')]
[2024-09-18 11:59:21,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 6705152. Throughput: 0: 934.9. Samples: 674324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:59:21,726][00268] Avg episode reward: [(0, '26.072')]
[2024-09-18 11:59:22,342][11921] Updated weights for policy 0, policy_version 1638 (0.0036)
[2024-09-18 11:59:26,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3665.6). Total num frames: 6721536. Throughput: 0: 885.5. Samples: 678716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:59:26,722][00268] Avg episode reward: [(0, '24.927')]
[2024-09-18 11:59:31,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6746112. Throughput: 0: 923.4. Samples: 685316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:59:31,721][00268] Avg episode reward: [(0, '25.593')]
[2024-09-18 11:59:32,637][11921] Updated weights for policy 0, policy_version 1648 (0.0024)
[2024-09-18 11:59:36,721][00268] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3665.6). Total num frames: 6762496. Throughput: 0: 949.6. Samples: 688596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:59:36,725][00268] Avg episode reward: [(0, '25.252')]
[2024-09-18 11:59:41,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 6774784. Throughput: 0: 904.6. Samples: 692746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:59:41,722][00268] Avg episode reward: [(0, '23.549')]
[2024-09-18 11:59:44,670][11921] Updated weights for policy 0, policy_version 1658 (0.0018)
[2024-09-18 11:59:46,719][00268] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 6799360. Throughput: 0: 908.4. Samples: 698788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 11:59:46,725][00268] Avg episode reward: [(0, '23.246')]
[2024-09-18 11:59:51,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6819840. Throughput: 0: 936.0. Samples: 702076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 11:59:51,725][00268] Avg episode reward: [(0, '23.312')]
[2024-09-18 11:59:55,189][11921] Updated weights for policy 0, policy_version 1668 (0.0016)
[2024-09-18 11:59:56,722][00268] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3679.4). Total num frames: 6836224. Throughput: 0: 929.8. Samples: 707222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 11:59:56,725][00268] Avg episode reward: [(0, '23.432')]
[2024-09-18 12:00:01,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 6852608. Throughput: 0: 893.9. Samples: 712238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:01,725][00268] Avg episode reward: [(0, '24.060')]
[2024-09-18 12:00:06,063][11921] Updated weights for policy 0, policy_version 1678 (0.0020)
[2024-09-18 12:00:06,719][00268] Fps is (10 sec: 3687.7, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6873088. Throughput: 0: 914.8. Samples: 715490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:06,721][00268] Avg episode reward: [(0, '25.224')]
[2024-09-18 12:00:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 6893568. Throughput: 0: 954.0. Samples: 721648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:00:11,725][00268] Avg episode reward: [(0, '26.514')]
[2024-09-18 12:00:16,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 6905856. Throughput: 0: 900.5. Samples: 725840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:00:16,722][00268] Avg episode reward: [(0, '26.208')]
[2024-09-18 12:00:17,979][11921] Updated weights for policy 0, policy_version 1688 (0.0038)
[2024-09-18 12:00:21,719][00268] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 6930432. Throughput: 0: 902.7. Samples: 729214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:00:21,726][00268] Avg episode reward: [(0, '26.966')]
[2024-09-18 12:00:26,719][00268] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 6950912. Throughput: 0: 953.2. Samples: 735642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:00:26,725][00268] Avg episode reward: [(0, '27.033')]
[2024-09-18 12:00:28,095][11921] Updated weights for policy 0, policy_version 1698 (0.0017)
[2024-09-18 12:00:31,723][00268] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3679.4). Total num frames: 6963200. Throughput: 0: 918.2. Samples: 740110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:31,726][00268] Avg episode reward: [(0, '27.927')]
[2024-09-18 12:00:31,741][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001700_6963200.pth...
[2024-09-18 12:00:32,020][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001484_6078464.pth
[2024-09-18 12:00:36,719][00268] Fps is (10 sec: 2867.4, 60 sec: 3618.3, 300 sec: 3679.5). Total num frames: 6979584. Throughput: 0: 895.0. Samples: 742350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:00:36,721][00268] Avg episode reward: [(0, '27.250')]
[2024-09-18 12:00:39,691][11921] Updated weights for policy 0, policy_version 1708 (0.0017)
[2024-09-18 12:00:41,719][00268] Fps is (10 sec: 4097.8, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 7004160. Throughput: 0: 928.5. Samples: 749000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:41,725][00268] Avg episode reward: [(0, '28.422')]
[2024-09-18 12:00:46,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7020544. Throughput: 0: 937.1. Samples: 754408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:46,728][00268] Avg episode reward: [(0, '29.734')]
[2024-09-18 12:00:51,452][11921] Updated weights for policy 0, policy_version 1718 (0.0019)
[2024-09-18 12:00:51,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 7036928. Throughput: 0: 911.4. Samples: 756504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:00:51,723][00268] Avg episode reward: [(0, '29.012')]
[2024-09-18 12:00:56,719][00268] Fps is (10 sec: 3686.3, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 7057408. Throughput: 0: 906.3. Samples: 762432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:00:56,722][00268] Avg episode reward: [(0, '29.362')]
[2024-09-18 12:01:00,979][11921] Updated weights for policy 0, policy_version 1728 (0.0014)
[2024-09-18 12:01:01,723][00268] Fps is (10 sec: 4094.2, 60 sec: 3754.4, 300 sec: 3679.4). Total num frames: 7077888. Throughput: 0: 954.3. Samples: 768788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 12:01:01,736][00268] Avg episode reward: [(0, '27.385')]
[2024-09-18 12:01:06,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 7090176. Throughput: 0: 923.9. Samples: 770788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:06,721][00268] Avg episode reward: [(0, '27.252')]
[2024-09-18 12:01:11,719][00268] Fps is (10 sec: 3278.1, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7110656. Throughput: 0: 893.1. Samples: 775832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:11,726][00268] Avg episode reward: [(0, '26.320')]
[2024-09-18 12:01:13,033][11921] Updated weights for policy 0, policy_version 1738 (0.0022)
[2024-09-18 12:01:16,719][00268] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7131136. Throughput: 0: 943.1. Samples: 782546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:16,726][00268] Avg episode reward: [(0, '25.834')]
[2024-09-18 12:01:21,721][00268] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 7147520. Throughput: 0: 955.6. Samples: 785354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:01:21,725][00268] Avg episode reward: [(0, '25.833')]
[2024-09-18 12:01:24,770][11921] Updated weights for policy 0, policy_version 1748 (0.0041)
[2024-09-18 12:01:26,719][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 7163904. Throughput: 0: 899.3. Samples: 789470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:26,724][00268] Avg episode reward: [(0, '27.265')]
[2024-09-18 12:01:31,719][00268] Fps is (10 sec: 4096.8, 60 sec: 3755.0, 300 sec: 3693.3). Total num frames: 7188480. Throughput: 0: 923.4. Samples: 795962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:01:31,722][00268] Avg episode reward: [(0, '26.010')]
[2024-09-18 12:01:34,509][11921] Updated weights for policy 0, policy_version 1758 (0.0015)
[2024-09-18 12:01:36,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 7208960. Throughput: 0: 949.2. Samples: 799216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:36,721][00268] Avg episode reward: [(0, '27.834')]
[2024-09-18 12:01:41,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7221248. Throughput: 0: 920.9. Samples: 803872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:41,725][00268] Avg episode reward: [(0, '28.086')]
[2024-09-18 12:01:46,488][11921] Updated weights for policy 0, policy_version 1768 (0.0017)
[2024-09-18 12:01:46,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 7241728. Throughput: 0: 902.3. Samples: 809388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:46,721][00268] Avg episode reward: [(0, '28.723')]
[2024-09-18 12:01:51,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 7262208. Throughput: 0: 930.0. Samples: 812638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:01:51,721][00268] Avg episode reward: [(0, '27.993')]
[2024-09-18 12:01:56,719][00268] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7278592. Throughput: 0: 939.9. Samples: 818126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:01:56,724][00268] Avg episode reward: [(0, '27.581')]
[2024-09-18 12:01:57,608][11921] Updated weights for policy 0, policy_version 1778 (0.0019)
[2024-09-18 12:02:01,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3693.3). Total num frames: 7294976. Throughput: 0: 891.9. Samples: 822682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 12:02:01,721][00268] Avg episode reward: [(0, '28.306')]
[2024-09-18 12:02:06,719][00268] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7315456. Throughput: 0: 902.2. Samples: 825952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:02:06,725][00268] Avg episode reward: [(0, '28.468')]
[2024-09-18 12:02:07,951][11921] Updated weights for policy 0, policy_version 1788 (0.0020)
[2024-09-18 12:02:11,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7335936. Throughput: 0: 957.7. Samples: 832568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:02:11,726][00268] Avg episode reward: [(0, '28.431')]
[2024-09-18 12:02:16,720][00268] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3679.4). Total num frames: 7348224. Throughput: 0: 902.8. Samples: 836590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:02:16,728][00268] Avg episode reward: [(0, '27.077')]
[2024-09-18 12:02:20,131][11921] Updated weights for policy 0, policy_version 1798 (0.0027)
[2024-09-18 12:02:21,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 7368704. Throughput: 0: 892.6. Samples: 839384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:02:21,721][00268] Avg episode reward: [(0, '27.229')]
[2024-09-18 12:02:26,719][00268] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7389184. Throughput: 0: 935.7. Samples: 845978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:02:26,722][00268] Avg episode reward: [(0, '27.974')]
[2024-09-18 12:02:30,900][11921] Updated weights for policy 0, policy_version 1808 (0.0013)
[2024-09-18 12:02:31,719][00268] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7405568. Throughput: 0: 920.6. Samples: 850816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:02:31,725][00268] Avg episode reward: [(0, '27.048')]
[2024-09-18 12:02:31,739][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001808_7405568.pth...
[2024-09-18 12:02:31,971][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001592_6520832.pth
[2024-09-18 12:02:36,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 7421952. Throughput: 0: 891.7. Samples: 852766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:02:36,721][00268] Avg episode reward: [(0, '26.089')]
[2024-09-18 12:02:41,719][00268] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7442432. Throughput: 0: 911.3. Samples: 859134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:02:41,726][00268] Avg episode reward: [(0, '25.247')]
[2024-09-18 12:02:41,947][11921] Updated weights for policy 0, policy_version 1818 (0.0013)
[2024-09-18 12:02:46,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7462912. Throughput: 0: 941.4. Samples: 865046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:02:46,729][00268] Avg episode reward: [(0, '27.200')]
[2024-09-18 12:02:51,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 7475200. Throughput: 0: 914.2. Samples: 867090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:02:51,721][00268] Avg episode reward: [(0, '27.240')]
[2024-09-18 12:02:53,848][11921] Updated weights for policy 0, policy_version 1828 (0.0018)
[2024-09-18 12:02:56,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 7495680. Throughput: 0: 888.2. Samples: 872536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-18 12:02:56,723][00268] Avg episode reward: [(0, '27.196')]
[2024-09-18 12:03:01,719][00268] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7520256. Throughput: 0: 944.0. Samples: 879070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:03:01,723][00268] Avg episode reward: [(0, '27.551')]
[2024-09-18 12:03:04,409][11921] Updated weights for policy 0, policy_version 1838 (0.0033)
[2024-09-18 12:03:06,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 7532544. Throughput: 0: 932.0. Samples: 881324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:03:06,726][00268] Avg episode reward: [(0, '28.536')]
[2024-09-18 12:03:11,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7553024. Throughput: 0: 890.2. Samples: 886038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:03:11,726][00268] Avg episode reward: [(0, '28.465')]
[2024-09-18 12:03:15,285][11921] Updated weights for policy 0, policy_version 1848 (0.0015)
[2024-09-18 12:03:16,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7573504. Throughput: 0: 933.5. Samples: 892824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:03:16,728][00268] Avg episode reward: [(0, '28.093')]
[2024-09-18 12:03:21,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7593984. Throughput: 0: 965.3. Samples: 896206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:03:21,721][00268] Avg episode reward: [(0, '28.711')]
[2024-09-18 12:03:26,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7606272. Throughput: 0: 912.7. Samples: 900206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:03:26,721][00268] Avg episode reward: [(0, '26.209')]
[2024-09-18 12:03:27,488][11921] Updated weights for policy 0, policy_version 1858 (0.0020)
[2024-09-18 12:03:31,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7626752. Throughput: 0: 910.4. Samples: 906014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:03:31,722][00268] Avg episode reward: [(0, '27.397')]
[2024-09-18 12:03:36,722][00268] Fps is (10 sec: 4094.7, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 7647232. Throughput: 0: 936.9. Samples: 909254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-18 12:03:36,724][00268] Avg episode reward: [(0, '26.876')]
[2024-09-18 12:03:36,769][11921] Updated weights for policy 0, policy_version 1868 (0.0024)
[2024-09-18 12:03:41,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7663616. Throughput: 0: 928.6. Samples: 914324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 12:03:41,725][00268] Avg episode reward: [(0, '26.087')]
[2024-09-18 12:03:46,720][00268] Fps is (10 sec: 3277.4, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 7680000. Throughput: 0: 896.7. Samples: 919422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 12:03:46,726][00268] Avg episode reward: [(0, '25.837')]
[2024-09-18 12:03:48,805][11921] Updated weights for policy 0, policy_version 1878 (0.0026)
[2024-09-18 12:03:51,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 7704576. Throughput: 0: 918.9. Samples: 922676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:03:51,726][00268] Avg episode reward: [(0, '25.995')]
[2024-09-18 12:03:56,720][00268] Fps is (10 sec: 4096.0, 60 sec: 3754.6, 300 sec: 3679.5). Total num frames: 7720960. Throughput: 0: 948.3. Samples: 928714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:03:56,723][00268] Avg episode reward: [(0, '27.092')]
[2024-09-18 12:04:00,954][11921] Updated weights for policy 0, policy_version 1888 (0.0019)
[2024-09-18 12:04:01,719][00268] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3679.5). Total num frames: 7733248. Throughput: 0: 889.5. Samples: 932852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:01,726][00268] Avg episode reward: [(0, '26.959')]
[2024-09-18 12:04:06,719][00268] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 7757824. Throughput: 0: 883.6. Samples: 935968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:06,726][00268] Avg episode reward: [(0, '26.889')]
[2024-09-18 12:04:10,485][11921] Updated weights for policy 0, policy_version 1898 (0.0014)
[2024-09-18 12:04:11,719][00268] Fps is (10 sec: 4505.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7778304. Throughput: 0: 943.0. Samples: 942642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:11,727][00268] Avg episode reward: [(0, '27.312')]
[2024-09-18 12:04:16,721][00268] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 7790592. Throughput: 0: 916.5. Samples: 947258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:04:16,733][00268] Avg episode reward: [(0, '28.035')]
[2024-09-18 12:04:21,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 7811072. Throughput: 0: 895.4. Samples: 949544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:04:21,723][00268] Avg episode reward: [(0, '28.479')]
[2024-09-18 12:04:22,364][11921] Updated weights for policy 0, policy_version 1908 (0.0018)
[2024-09-18 12:04:26,719][00268] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 7831552. Throughput: 0: 931.9. Samples: 956258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-18 12:04:26,723][00268] Avg episode reward: [(0, '29.121')]
[2024-09-18 12:04:31,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7847936. Throughput: 0: 938.9. Samples: 961672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:31,727][00268] Avg episode reward: [(0, '29.297')]
[2024-09-18 12:04:31,742][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001916_7847936.pth...
[2024-09-18 12:04:32,004][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001700_6963200.pth
[2024-09-18 12:04:33,759][11921] Updated weights for policy 0, policy_version 1918 (0.0018)
[2024-09-18 12:04:36,719][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3693.3). Total num frames: 7864320. Throughput: 0: 908.5. Samples: 963560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 12:04:36,721][00268] Avg episode reward: [(0, '29.971')]
[2024-09-18 12:04:41,719][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 7884800. Throughput: 0: 908.7. Samples: 969604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:41,726][00268] Avg episode reward: [(0, '30.055')]
[2024-09-18 12:04:43,851][11921] Updated weights for policy 0, policy_version 1928 (0.0015)
[2024-09-18 12:04:46,721][00268] Fps is (10 sec: 4095.2, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 7905280. Throughput: 0: 960.5. Samples: 976074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-18 12:04:46,728][00268] Avg episode reward: [(0, '29.777')]
[2024-09-18 12:04:51,719][00268] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 7921664. Throughput: 0: 935.2. Samples: 978054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:04:51,721][00268] Avg episode reward: [(0, '29.816')]
[2024-09-18 12:04:55,772][11921] Updated weights for policy 0, policy_version 1938 (0.0014)
[2024-09-18 12:04:56,720][00268] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3679.4). Total num frames: 7938048. Throughput: 0: 899.3. Samples: 983110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-18 12:04:56,723][00268] Avg episode reward: [(0, '30.122')]
[2024-09-18 12:05:01,719][00268] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 7962624. Throughput: 0: 939.9. Samples: 989552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 12:05:01,725][00268] Avg episode reward: [(0, '30.431')]
[2024-09-18 12:05:01,734][11908] Saving new best policy, reward=30.431!
[2024-09-18 12:05:06,535][11921] Updated weights for policy 0, policy_version 1948 (0.0039)
[2024-09-18 12:05:06,720][00268] Fps is (10 sec: 4096.2, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 7979008. Throughput: 0: 950.5. Samples: 992316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-18 12:05:06,725][00268] Avg episode reward: [(0, '29.583')]
[2024-09-18 12:05:11,719][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 7991296. Throughput: 0: 893.5. Samples: 996464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-18 12:05:11,726][00268] Avg episode reward: [(0, '29.750')]
[2024-09-18 12:05:14,587][11908] Stopping Batcher_0...
[2024-09-18 12:05:14,588][11908] Loop batcher_evt_loop terminating...
[2024-09-18 12:05:14,589][00268] Component Batcher_0 stopped!
[2024-09-18 12:05:14,599][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-09-18 12:05:14,642][11921] Weights refcount: 2 0
[2024-09-18 12:05:14,648][00268] Component InferenceWorker_p0-w0 stopped!
[2024-09-18 12:05:14,656][11921] Stopping InferenceWorker_p0-w0...
[2024-09-18 12:05:14,658][11921] Loop inference_proc0-0_evt_loop terminating...
[2024-09-18 12:05:14,775][11908] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001808_7405568.pth
[2024-09-18 12:05:14,805][11908] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-09-18 12:05:15,011][11908] Stopping LearnerWorker_p0...
[2024-09-18 12:05:15,015][11908] Loop learner_proc0_evt_loop terminating...
[2024-09-18 12:05:15,012][00268] Component LearnerWorker_p0 stopped!
[2024-09-18 12:05:15,040][00268] Component RolloutWorker_w2 stopped!
[2024-09-18 12:05:15,047][00268] Component RolloutWorker_w4 stopped!
[2024-09-18 12:05:15,046][11924] Stopping RolloutWorker_w2...
[2024-09-18 12:05:15,052][11926] Stopping RolloutWorker_w4...
[2024-09-18 12:05:15,056][00268] Component RolloutWorker_w0 stopped!
[2024-09-18 12:05:15,053][11924] Loop rollout_proc2_evt_loop terminating...
[2024-09-18 12:05:15,054][11926] Loop rollout_proc4_evt_loop terminating...
[2024-09-18 12:05:15,061][11922] Stopping RolloutWorker_w0...
[2024-09-18 12:05:15,067][00268] Component RolloutWorker_w6 stopped!
[2024-09-18 12:05:15,066][11922] Loop rollout_proc0_evt_loop terminating...
[2024-09-18 12:05:15,072][11927] Stopping RolloutWorker_w6...
[2024-09-18 12:05:15,074][11927] Loop rollout_proc6_evt_loop terminating...
[2024-09-18 12:05:15,230][00268] Component RolloutWorker_w7 stopped!
[2024-09-18 12:05:15,233][11929] Stopping RolloutWorker_w7...
[2024-09-18 12:05:15,236][11929] Loop rollout_proc7_evt_loop terminating...
[2024-09-18 12:05:15,267][00268] Component RolloutWorker_w5 stopped!
[2024-09-18 12:05:15,270][11928] Stopping RolloutWorker_w5...
[2024-09-18 12:05:15,276][11928] Loop rollout_proc5_evt_loop terminating...
[2024-09-18 12:05:15,312][00268] Component RolloutWorker_w3 stopped!
[2024-09-18 12:05:15,314][11925] Stopping RolloutWorker_w3...
[2024-09-18 12:05:15,314][11925] Loop rollout_proc3_evt_loop terminating...
[2024-09-18 12:05:15,381][00268] Component RolloutWorker_w1 stopped!
[2024-09-18 12:05:15,387][00268] Waiting for process learner_proc0 to stop...
[2024-09-18 12:05:15,391][11923] Stopping RolloutWorker_w1...
[2024-09-18 12:05:15,393][11923] Loop rollout_proc1_evt_loop terminating...
[2024-09-18 12:05:17,441][00268] Waiting for process inference_proc0-0 to join...
[2024-09-18 12:05:17,522][00268] Waiting for process rollout_proc0 to join...
[2024-09-18 12:05:17,948][00268] Waiting for process rollout_proc1 to join...
[2024-09-18 12:05:17,970][00268] Waiting for process rollout_proc2 to join...
[2024-09-18 12:05:17,971][00268] Waiting for process rollout_proc3 to join...
[2024-09-18 12:05:17,976][00268] Waiting for process rollout_proc4 to join...
[2024-09-18 12:05:17,979][00268] Waiting for process rollout_proc5 to join...
[2024-09-18 12:05:17,980][00268] Waiting for process rollout_proc6 to join...
[2024-09-18 12:05:17,984][00268] Waiting for process rollout_proc7 to join...
[2024-09-18 12:05:17,987][00268] Batcher 0 profile tree view:
batching: 26.7487, releasing_batches: 0.0239
[2024-09-18 12:05:17,991][00268] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 474.2526
update_model: 7.8235
weight_update: 0.0023
one_step: 0.0022
handle_policy_step: 577.7240
deserialize: 15.3204, stack: 3.0071, obs_to_device_normalize: 117.7529, forward: 291.3099, send_messages: 29.0593
prepare_outputs: 91.9556
to_cpu: 58.5476
[2024-09-18 12:05:17,993][00268] Learner 0 profile tree view:
misc: 0.0055, prepare_batch: 16.1687
train: 79.2732
epoch_init: 0.0102, minibatch_init: 0.0137, losses_postprocess: 0.5746, kl_divergence: 0.5765, after_optimizer: 3.2603
calculate_losses: 24.3960
losses_init: 0.0093, forward_head: 1.8492, bptt_initial: 15.5723, tail: 1.1127, advantages_returns: 0.2897, losses: 2.6898
bptt: 2.4617
bptt_forward_core: 2.3822
update: 49.7214
clip: 1.4882
[2024-09-18 12:05:17,994][00268] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3711, enqueue_policy_requests: 119.1639, env_step: 857.3822, overhead: 14.7818, complete_rollouts: 7.3833
save_policy_outputs: 26.2504
split_output_tensors: 8.8148
[2024-09-18 12:05:17,995][00268] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3241, enqueue_policy_requests: 117.0709, env_step: 859.9379, overhead: 15.5190, complete_rollouts: 7.2719
save_policy_outputs: 26.8021
split_output_tensors: 8.9067
[2024-09-18 12:05:17,996][00268] Loop Runner_EvtLoop terminating...
[2024-09-18 12:05:17,998][00268] Runner profile tree view:
main_loop: 1124.6084
[2024-09-18 12:05:17,999][00268] Collected {0: 8007680}, FPS: 3558.4
[2024-09-18 12:05:18,033][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 12:05:18,034][00268] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-18 12:05:18,036][00268] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-18 12:05:18,038][00268] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-18 12:05:18,040][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 12:05:18,041][00268] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-18 12:05:18,043][00268] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 12:05:18,044][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-18 12:05:18,045][00268] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-18 12:05:18,046][00268] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-18 12:05:18,047][00268] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-18 12:05:18,048][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-18 12:05:18,049][00268] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-18 12:05:18,051][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-18 12:05:18,052][00268] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-18 12:05:18,061][00268] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 12:05:18,069][00268] RunningMeanStd input shape: (1,)
[2024-09-18 12:05:18,082][00268] ConvEncoder: input_channels=3
[2024-09-18 12:05:18,143][00268] Conv encoder output size: 512
[2024-09-18 12:05:18,145][00268] Policy head output size: 512
[2024-09-18 12:05:18,171][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-09-18 12:05:18,658][00268] Num frames 100...
[2024-09-18 12:05:18,781][00268] Num frames 200...
[2024-09-18 12:05:18,897][00268] Num frames 300...
[2024-09-18 12:05:19,051][00268] Num frames 400...
[2024-09-18 12:05:19,174][00268] Num frames 500...
[2024-09-18 12:05:19,297][00268] Num frames 600...
[2024-09-18 12:05:19,418][00268] Num frames 700...
[2024-09-18 12:05:19,535][00268] Num frames 800...
[2024-09-18 12:05:19,651][00268] Num frames 900...
[2024-09-18 12:05:19,775][00268] Num frames 1000...
[2024-09-18 12:05:19,895][00268] Num frames 1100...
[2024-09-18 12:05:19,980][00268] Avg episode rewards: #0: 27.200, true rewards: #0: 11.200
[2024-09-18 12:05:19,982][00268] Avg episode reward: 27.200, avg true_objective: 11.200
[2024-09-18 12:05:20,083][00268] Num frames 1200...
[2024-09-18 12:05:20,198][00268] Num frames 1300...
[2024-09-18 12:05:20,314][00268] Num frames 1400...
[2024-09-18 12:05:20,433][00268] Num frames 1500...
[2024-09-18 12:05:20,569][00268] Num frames 1600...
[2024-09-18 12:05:20,736][00268] Num frames 1700...
[2024-09-18 12:05:20,904][00268] Num frames 1800...
[2024-09-18 12:05:21,083][00268] Num frames 1900...
[2024-09-18 12:05:21,240][00268] Num frames 2000...
[2024-09-18 12:05:21,408][00268] Num frames 2100...
[2024-09-18 12:05:21,567][00268] Num frames 2200...
[2024-09-18 12:05:21,727][00268] Num frames 2300...
[2024-09-18 12:05:21,895][00268] Num frames 2400...
[2024-09-18 12:05:22,078][00268] Num frames 2500...
[2024-09-18 12:05:22,241][00268] Num frames 2600...
[2024-09-18 12:05:22,375][00268] Avg episode rewards: #0: 35.225, true rewards: #0: 13.225
[2024-09-18 12:05:22,378][00268] Avg episode reward: 35.225, avg true_objective: 13.225
[2024-09-18 12:05:22,475][00268] Num frames 2700...
[2024-09-18 12:05:22,640][00268] Num frames 2800...
[2024-09-18 12:05:22,810][00268] Num frames 2900...
[2024-09-18 12:05:22,949][00268] Num frames 3000...
[2024-09-18 12:05:23,065][00268] Num frames 3100...
[2024-09-18 12:05:23,190][00268] Num frames 3200...
[2024-09-18 12:05:23,309][00268] Num frames 3300...
[2024-09-18 12:05:23,427][00268] Num frames 3400...
[2024-09-18 12:05:23,546][00268] Num frames 3500...
[2024-09-18 12:05:23,670][00268] Num frames 3600...
[2024-09-18 12:05:23,790][00268] Num frames 3700...
[2024-09-18 12:05:23,912][00268] Num frames 3800...
[2024-09-18 12:05:24,043][00268] Num frames 3900...
[2024-09-18 12:05:24,184][00268] Num frames 4000...
[2024-09-18 12:05:24,309][00268] Num frames 4100...
[2024-09-18 12:05:24,429][00268] Num frames 4200...
[2024-09-18 12:05:24,549][00268] Num frames 4300...
[2024-09-18 12:05:24,669][00268] Num frames 4400...
[2024-09-18 12:05:24,789][00268] Num frames 4500...
[2024-09-18 12:05:24,912][00268] Num frames 4600...
[2024-09-18 12:05:25,008][00268] Avg episode rewards: #0: 39.423, true rewards: #0: 15.423
[2024-09-18 12:05:25,009][00268] Avg episode reward: 39.423, avg true_objective: 15.423
[2024-09-18 12:05:25,095][00268] Num frames 4700...
[2024-09-18 12:05:25,219][00268] Num frames 4800...
[2024-09-18 12:05:25,331][00268] Num frames 4900...
[2024-09-18 12:05:25,450][00268] Num frames 5000...
[2024-09-18 12:05:25,568][00268] Num frames 5100...
[2024-09-18 12:05:25,684][00268] Num frames 5200...
[2024-09-18 12:05:25,808][00268] Num frames 5300...
[2024-09-18 12:05:25,933][00268] Num frames 5400...
[2024-09-18 12:05:26,058][00268] Num frames 5500...
[2024-09-18 12:05:26,181][00268] Num frames 5600...
[2024-09-18 12:05:26,308][00268] Num frames 5700...
[2024-09-18 12:05:26,428][00268] Num frames 5800...
[2024-09-18 12:05:26,546][00268] Num frames 5900...
[2024-09-18 12:05:26,665][00268] Num frames 6000...
[2024-09-18 12:05:26,813][00268] Avg episode rewards: #0: 39.445, true rewards: #0: 15.195
[2024-09-18 12:05:26,816][00268] Avg episode reward: 39.445, avg true_objective: 15.195
[2024-09-18 12:05:26,845][00268] Num frames 6100...
[2024-09-18 12:05:26,970][00268] Num frames 6200...
[2024-09-18 12:05:27,088][00268] Num frames 6300...
[2024-09-18 12:05:27,206][00268] Num frames 6400...
[2024-09-18 12:05:27,334][00268] Num frames 6500...
[2024-09-18 12:05:27,450][00268] Num frames 6600...
[2024-09-18 12:05:27,637][00268] Avg episode rewards: #0: 34.192, true rewards: #0: 13.392
[2024-09-18 12:05:27,639][00268] Avg episode reward: 34.192, avg true_objective: 13.392
[2024-09-18 12:05:27,649][00268] Num frames 6700...
[2024-09-18 12:05:27,765][00268] Num frames 6800...
[2024-09-18 12:05:27,881][00268] Num frames 6900...
[2024-09-18 12:05:28,010][00268] Num frames 7000...
[2024-09-18 12:05:28,135][00268] Num frames 7100...
[2024-09-18 12:05:28,273][00268] Num frames 7200...
[2024-09-18 12:05:28,400][00268] Num frames 7300...
[2024-09-18 12:05:28,519][00268] Num frames 7400...
[2024-09-18 12:05:28,642][00268] Num frames 7500...
[2024-09-18 12:05:28,761][00268] Num frames 7600...
[2024-09-18 12:05:28,881][00268] Num frames 7700...
[2024-09-18 12:05:29,011][00268] Num frames 7800...
[2024-09-18 12:05:29,139][00268] Num frames 7900...
[2024-09-18 12:05:29,260][00268] Num frames 8000...
[2024-09-18 12:05:29,383][00268] Num frames 8100...
[2024-09-18 12:05:29,507][00268] Num frames 8200...
[2024-09-18 12:05:29,559][00268] Avg episode rewards: #0: 34.500, true rewards: #0: 13.667
[2024-09-18 12:05:29,561][00268] Avg episode reward: 34.500, avg true_objective: 13.667
[2024-09-18 12:05:29,675][00268] Num frames 8300...
[2024-09-18 12:05:29,788][00268] Num frames 8400...
[2024-09-18 12:05:29,909][00268] Num frames 8500...
[2024-09-18 12:05:30,032][00268] Num frames 8600...
[2024-09-18 12:05:30,158][00268] Num frames 8700...
[2024-09-18 12:05:30,275][00268] Num frames 8800...
[2024-09-18 12:05:30,402][00268] Num frames 8900...
[2024-09-18 12:05:30,520][00268] Num frames 9000...
[2024-09-18 12:05:30,639][00268] Num frames 9100...
[2024-09-18 12:05:30,755][00268] Num frames 9200...
[2024-09-18 12:05:30,881][00268] Num frames 9300...
[2024-09-18 12:05:31,008][00268] Num frames 9400...
[2024-09-18 12:05:31,133][00268] Num frames 9500...
[2024-09-18 12:05:31,252][00268] Num frames 9600...
[2024-09-18 12:05:31,378][00268] Num frames 9700...
[2024-09-18 12:05:31,495][00268] Num frames 9800...
[2024-09-18 12:05:31,616][00268] Num frames 9900...
[2024-09-18 12:05:31,738][00268] Num frames 10000...
[2024-09-18 12:05:31,863][00268] Num frames 10100...
[2024-09-18 12:05:31,988][00268] Num frames 10200...
[2024-09-18 12:05:32,117][00268] Num frames 10300...
[2024-09-18 12:05:32,170][00268] Avg episode rewards: #0: 38.000, true rewards: #0: 14.714
[2024-09-18 12:05:32,172][00268] Avg episode reward: 38.000, avg true_objective: 14.714
[2024-09-18 12:05:32,287][00268] Num frames 10400...
[2024-09-18 12:05:32,411][00268] Num frames 10500...
[2024-09-18 12:05:32,525][00268] Num frames 10600...
[2024-09-18 12:05:32,640][00268] Num frames 10700...
[2024-09-18 12:05:32,757][00268] Num frames 10800...
[2024-09-18 12:05:32,887][00268] Num frames 10900...
[2024-09-18 12:05:33,055][00268] Num frames 11000...
[2024-09-18 12:05:33,223][00268] Num frames 11100...
[2024-09-18 12:05:33,389][00268] Num frames 11200...
[2024-09-18 12:05:33,555][00268] Num frames 11300...
[2024-09-18 12:05:33,713][00268] Num frames 11400...
[2024-09-18 12:05:33,805][00268] Avg episode rewards: #0: 36.400, true rewards: #0: 14.275
[2024-09-18 12:05:33,807][00268] Avg episode reward: 36.400, avg true_objective: 14.275
[2024-09-18 12:05:33,948][00268] Num frames 11500...
[2024-09-18 12:05:34,113][00268] Num frames 11600...
[2024-09-18 12:05:34,287][00268] Num frames 11700...
[2024-09-18 12:05:34,471][00268] Num frames 11800...
[2024-09-18 12:05:34,638][00268] Num frames 11900...
[2024-09-18 12:05:34,809][00268] Num frames 12000...
[2024-09-18 12:05:34,995][00268] Num frames 12100...
[2024-09-18 12:05:35,185][00268] Num frames 12200...
[2024-09-18 12:05:35,344][00268] Num frames 12300...
[2024-09-18 12:05:35,473][00268] Num frames 12400...
[2024-09-18 12:05:35,603][00268] Num frames 12500...
[2024-09-18 12:05:35,723][00268] Num frames 12600...
[2024-09-18 12:05:35,848][00268] Num frames 12700...
[2024-09-18 12:05:35,974][00268] Num frames 12800...
[2024-09-18 12:05:36,098][00268] Num frames 12900...
[2024-09-18 12:05:36,220][00268] Num frames 13000...
[2024-09-18 12:05:36,344][00268] Num frames 13100...
[2024-09-18 12:05:36,468][00268] Num frames 13200...
[2024-09-18 12:05:36,599][00268] Num frames 13300...
[2024-09-18 12:05:36,722][00268] Num frames 13400...
[2024-09-18 12:05:36,843][00268] Num frames 13500...
[2024-09-18 12:05:36,924][00268] Avg episode rewards: #0: 39.133, true rewards: #0: 15.022
[2024-09-18 12:05:36,925][00268] Avg episode reward: 39.133, avg true_objective: 15.022
[2024-09-18 12:05:37,025][00268] Num frames 13600...
[2024-09-18 12:05:37,144][00268] Num frames 13700...
[2024-09-18 12:05:37,268][00268] Num frames 13800...
[2024-09-18 12:05:37,391][00268] Num frames 13900...
[2024-09-18 12:05:37,515][00268] Num frames 14000...
[2024-09-18 12:05:37,652][00268] Num frames 14100...
[2024-09-18 12:05:37,797][00268] Num frames 14200...
[2024-09-18 12:05:38,025][00268] Num frames 14300...
[2024-09-18 12:05:38,255][00268] Num frames 14400...
[2024-09-18 12:05:38,462][00268] Num frames 14500...
[2024-09-18 12:05:38,569][00268] Avg episode rewards: #0: 37.544, true rewards: #0: 14.544
[2024-09-18 12:05:38,571][00268] Avg episode reward: 37.544, avg true_objective: 14.544
[2024-09-18 12:07:11,264][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-18 12:07:11,919][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-18 12:07:11,922][00268] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-18 12:07:11,924][00268] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-18 12:07:11,926][00268] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-18 12:07:11,927][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-18 12:07:11,929][00268] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-18 12:07:11,931][00268] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-18 12:07:11,932][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-18 12:07:11,933][00268] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-18 12:07:11,934][00268] Adding new argument 'hf_repository'='mkdem/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-18 12:07:11,935][00268] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-18 12:07:11,936][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-18 12:07:11,937][00268] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-18 12:07:11,938][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-18 12:07:11,939][00268] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-18 12:07:11,951][00268] RunningMeanStd input shape: (3, 72, 128)
[2024-09-18 12:07:11,955][00268] RunningMeanStd input shape: (1,)
[2024-09-18 12:07:11,982][00268] ConvEncoder: input_channels=3
[2024-09-18 12:07:12,041][00268] Conv encoder output size: 512
[2024-09-18 12:07:12,043][00268] Policy head output size: 512
[2024-09-18 12:07:12,069][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2024-09-18 12:07:12,848][00268] Num frames 100...
[2024-09-18 12:07:13,007][00268] Num frames 200...
[2024-09-18 12:07:13,157][00268] Num frames 300...
[2024-09-18 12:07:13,318][00268] Num frames 400...
[2024-09-18 12:07:13,469][00268] Num frames 500...
[2024-09-18 12:07:13,617][00268] Num frames 600...
[2024-09-18 12:07:13,769][00268] Num frames 700...
[2024-09-18 12:07:13,919][00268] Num frames 800...
[2024-09-18 12:07:14,075][00268] Num frames 900...
[2024-09-18 12:07:14,230][00268] Num frames 1000...
[2024-09-18 12:07:14,400][00268] Num frames 1100...
[2024-09-18 12:07:14,555][00268] Num frames 1200...
[2024-09-18 12:07:14,706][00268] Num frames 1300...
[2024-09-18 12:07:14,859][00268] Num frames 1400...
[2024-09-18 12:07:15,016][00268] Num frames 1500...
[2024-09-18 12:07:15,171][00268] Num frames 1600...
[2024-09-18 12:07:15,327][00268] Num frames 1700...
[2024-09-18 12:07:15,494][00268] Num frames 1800...
[2024-09-18 12:07:15,671][00268] Num frames 1900...
[2024-09-18 12:07:15,860][00268] Avg episode rewards: #0: 48.839, true rewards: #0: 19.840
[2024-09-18 12:07:15,862][00268] Avg episode reward: 48.839, avg true_objective: 19.840
[2024-09-18 12:07:15,891][00268] Num frames 2000...
[2024-09-18 12:07:16,051][00268] Num frames 2100...
[2024-09-18 12:07:16,221][00268] Num frames 2200...
[2024-09-18 12:07:16,393][00268] Num frames 2300...
[2024-09-18 12:07:16,547][00268] Num frames 2400...
[2024-09-18 12:07:16,722][00268] Num frames 2500...
[2024-09-18 12:07:16,925][00268] Num frames 2600...
[2024-09-18 12:07:17,116][00268] Num frames 2700...
[2024-09-18 12:07:17,314][00268] Num frames 2800...
[2024-09-18 12:07:17,516][00268] Num frames 2900...
[2024-09-18 12:07:17,705][00268] Num frames 3000...
[2024-09-18 12:07:17,938][00268] Num frames 3100...
[2024-09-18 12:07:18,122][00268] Num frames 3200...
[2024-09-18 12:07:18,332][00268] Num frames 3300...
[2024-09-18 12:07:18,519][00268] Num frames 3400...
[2024-09-18 12:07:18,687][00268] Num frames 3500...
[2024-09-18 12:07:18,860][00268] Num frames 3600...
[2024-09-18 12:07:19,056][00268] Num frames 3700...
[2024-09-18 12:07:19,299][00268] Num frames 3800...
[2024-09-18 12:07:19,527][00268] Num frames 3900...
[2024-09-18 12:07:19,758][00268] Num frames 4000...
[2024-09-18 12:07:20,020][00268] Avg episode rewards: #0: 54.919, true rewards: #0: 20.420
[2024-09-18 12:07:20,023][00268] Avg episode reward: 54.919, avg true_objective: 20.420
[2024-09-18 12:07:20,062][00268] Num frames 4100...
[2024-09-18 12:07:20,266][00268] Num frames 4200...
[2024-09-18 12:07:20,499][00268] Num frames 4300...
[2024-09-18 12:07:20,719][00268] Num frames 4400...
[2024-09-18 12:07:20,919][00268] Num frames 4500...
[2024-09-18 12:07:21,132][00268] Num frames 4600...
[2024-09-18 12:07:21,346][00268] Num frames 4700...
[2024-09-18 12:07:21,559][00268] Num frames 4800...
[2024-09-18 12:07:21,750][00268] Num frames 4900...
[2024-09-18 12:07:21,913][00268] Num frames 5000...
[2024-09-18 12:07:22,071][00268] Num frames 5100...
[2024-09-18 12:07:22,236][00268] Num frames 5200...
[2024-09-18 12:07:22,398][00268] Num frames 5300...
[2024-09-18 12:07:22,556][00268] Num frames 5400...
[2024-09-18 12:07:22,720][00268] Num frames 5500...
[2024-09-18 12:07:22,890][00268] Num frames 5600...
[2024-09-18 12:07:23,038][00268] Num frames 5700...
[2024-09-18 12:07:23,162][00268] Num frames 5800...
[2024-09-18 12:07:23,282][00268] Num frames 5900...
[2024-09-18 12:07:23,401][00268] Num frames 6000...
[2024-09-18 12:07:23,521][00268] Num frames 6100...
[2024-09-18 12:07:23,690][00268] Avg episode rewards: #0: 55.279, true rewards: #0: 20.613
[2024-09-18 12:07:23,692][00268] Avg episode reward: 55.279, avg true_objective: 20.613
[2024-09-18 12:07:23,715][00268] Num frames 6200...
[2024-09-18 12:07:23,834][00268] Num frames 6300...
[2024-09-18 12:07:23,960][00268] Num frames 6400...
[2024-09-18 12:07:24,079][00268] Num frames 6500...
[2024-09-18 12:07:24,198][00268] Num frames 6600...
[2024-09-18 12:07:24,321][00268] Num frames 6700...
[2024-09-18 12:07:24,447][00268] Avg episode rewards: #0: 43.649, true rewards: #0: 16.900
[2024-09-18 12:07:24,448][00268] Avg episode reward: 43.649, avg true_objective: 16.900
[2024-09-18 12:07:24,499][00268] Num frames 6800...
[2024-09-18 12:07:24,625][00268] Num frames 6900...
[2024-09-18 12:07:24,743][00268] Num frames 7000...
[2024-09-18 12:07:24,863][00268] Num frames 7100...
[2024-09-18 12:07:24,986][00268] Num frames 7200...
[2024-09-18 12:07:25,106][00268] Num frames 7300...
[2024-09-18 12:07:25,230][00268] Num frames 7400...
[2024-09-18 12:07:25,284][00268] Avg episode rewards: #0: 36.999, true rewards: #0: 14.800
[2024-09-18 12:07:25,286][00268] Avg episode reward: 36.999, avg true_objective: 14.800
[2024-09-18 12:07:25,404][00268] Num frames 7500...
[2024-09-18 12:07:25,523][00268] Num frames 7600...
[2024-09-18 12:07:25,650][00268] Num frames 7700...
[2024-09-18 12:07:25,765][00268] Num frames 7800...
[2024-09-18 12:07:25,883][00268] Num frames 7900...
[2024-09-18 12:07:26,002][00268] Num frames 8000...
[2024-09-18 12:07:26,127][00268] Num frames 8100...
[2024-09-18 12:07:26,244][00268] Num frames 8200...
[2024-09-18 12:07:26,361][00268] Num frames 8300...
[2024-09-18 12:07:26,489][00268] Num frames 8400...
[2024-09-18 12:07:26,624][00268] Num frames 8500...
[2024-09-18 12:07:26,748][00268] Num frames 8600...
[2024-09-18 12:07:26,874][00268] Num frames 8700...
[2024-09-18 12:07:26,995][00268] Num frames 8800...
[2024-09-18 12:07:27,120][00268] Num frames 8900...
[2024-09-18 12:07:27,243][00268] Num frames 9000...
[2024-09-18 12:07:27,368][00268] Num frames 9100...
[2024-09-18 12:07:27,489][00268] Num frames 9200...
[2024-09-18 12:07:27,626][00268] Num frames 9300...
[2024-09-18 12:07:27,756][00268] Num frames 9400...
[2024-09-18 12:07:27,886][00268] Num frames 9500...
[2024-09-18 12:07:27,940][00268] Avg episode rewards: #0: 40.499, true rewards: #0: 15.833
[2024-09-18 12:07:27,942][00268] Avg episode reward: 40.499, avg true_objective: 15.833
[2024-09-18 12:07:28,065][00268] Num frames 9600...
[2024-09-18 12:07:28,208][00268] Num frames 9700...
[2024-09-18 12:07:28,344][00268] Num frames 9800...
[2024-09-18 12:07:28,500][00268] Avg episode rewards: #0: 35.691, true rewards: #0: 14.120
[2024-09-18 12:07:28,502][00268] Avg episode reward: 35.691, avg true_objective: 14.120
[2024-09-18 12:07:28,524][00268] Num frames 9900...
[2024-09-18 12:07:28,649][00268] Num frames 10000...
[2024-09-18 12:07:28,768][00268] Num frames 10100...
[2024-09-18 12:07:28,882][00268] Num frames 10200...
[2024-09-18 12:07:28,998][00268] Num frames 10300...
[2024-09-18 12:07:29,122][00268] Num frames 10400...
[2024-09-18 12:07:29,243][00268] Num frames 10500...
[2024-09-18 12:07:29,360][00268] Num frames 10600...
[2024-09-18 12:07:29,477][00268] Num frames 10700...
[2024-09-18 12:07:29,602][00268] Num frames 10800...
[2024-09-18 12:07:29,724][00268] Num frames 10900...
[2024-09-18 12:07:29,845][00268] Num frames 11000...
[2024-09-18 12:07:29,962][00268] Num frames 11100...
[2024-09-18 12:07:30,082][00268] Num frames 11200...
[2024-09-18 12:07:30,209][00268] Num frames 11300...
[2024-09-18 12:07:30,329][00268] Num frames 11400...
[2024-09-18 12:07:30,450][00268] Num frames 11500...
[2024-09-18 12:07:30,573][00268] Num frames 11600...
[2024-09-18 12:07:30,705][00268] Num frames 11700...
[2024-09-18 12:07:30,878][00268] Avg episode rewards: #0: 37.748, true rewards: #0: 14.749
[2024-09-18 12:07:30,881][00268] Avg episode reward: 37.748, avg true_objective: 14.749
[2024-09-18 12:07:30,885][00268] Num frames 11800...
[2024-09-18 12:07:31,002][00268] Num frames 11900...
[2024-09-18 12:07:31,122][00268] Num frames 12000...
[2024-09-18 12:07:31,249][00268] Num frames 12100...
[2024-09-18 12:07:31,367][00268] Num frames 12200...
[2024-09-18 12:07:31,484][00268] Num frames 12300...
[2024-09-18 12:07:31,627][00268] Avg episode rewards: #0: 34.972, true rewards: #0: 13.750
[2024-09-18 12:07:31,629][00268] Avg episode reward: 34.972, avg true_objective: 13.750
[2024-09-18 12:07:31,668][00268] Num frames 12400...
[2024-09-18 12:07:31,837][00268] Num frames 12500...
[2024-09-18 12:07:32,009][00268] Num frames 12600...
[2024-09-18 12:07:32,177][00268] Num frames 12700...
[2024-09-18 12:07:32,399][00268] Avg episode rewards: #0: 32.098, true rewards: #0: 12.799
[2024-09-18 12:07:32,401][00268] Avg episode reward: 32.098, avg true_objective: 12.799
[2024-09-18 12:07:32,408][00268] Num frames 12800...
[2024-09-18 12:08:53,783][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!