[2023-02-24 17:43:30,092][01148] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 17:43:30,094][01148] Rollout worker 0 uses device cpu [2023-02-24 17:43:30,095][01148] Rollout worker 1 uses device cpu [2023-02-24 17:43:30,096][01148] Rollout worker 2 uses device cpu [2023-02-24 17:43:30,098][01148] Rollout worker 3 uses device cpu [2023-02-24 17:43:30,099][01148] Rollout worker 4 uses device cpu [2023-02-24 17:43:30,101][01148] Rollout worker 5 uses device cpu [2023-02-24 17:43:30,103][01148] Rollout worker 6 uses device cpu [2023-02-24 17:43:30,105][01148] Rollout worker 7 uses device cpu [2023-02-24 17:43:30,327][01148] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 17:43:30,330][01148] InferenceWorker_p0-w0: min num requests: 2 [2023-02-24 17:43:30,368][01148] Starting all processes... [2023-02-24 17:43:30,371][01148] Starting process learner_proc0 [2023-02-24 17:43:30,446][01148] Starting all processes... [2023-02-24 17:43:30,460][01148] Starting process inference_proc0-0 [2023-02-24 17:43:30,474][01148] Starting process rollout_proc0 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc1 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc2 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc3 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc4 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc5 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc6 [2023-02-24 17:43:30,477][01148] Starting process rollout_proc7 [2023-02-24 17:43:39,775][17394] Worker 6 uses CPU cores [0] [2023-02-24 17:43:40,185][17389] Worker 1 uses CPU cores [1] [2023-02-24 17:43:40,409][17373] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 17:43:40,409][17373] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 17:43:40,426][17386] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 17:43:40,435][17386] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 17:43:40,493][17395] Worker 7 uses CPU cores [1] [2023-02-24 17:43:40,561][17391] Worker 3 uses CPU cores [1] [2023-02-24 17:43:40,608][17392] Worker 4 uses CPU cores [0] [2023-02-24 17:43:40,666][17390] Worker 2 uses CPU cores [0] [2023-02-24 17:43:40,668][17388] Worker 0 uses CPU cores [0] [2023-02-24 17:43:40,712][17393] Worker 5 uses CPU cores [1] [2023-02-24 17:43:41,167][17373] Num visible devices: 1 [2023-02-24 17:43:41,168][17386] Num visible devices: 1 [2023-02-24 17:43:41,182][17373] Starting seed is not provided [2023-02-24 17:43:41,182][17373] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 17:43:41,182][17373] Initializing actor-critic model on device cuda:0 [2023-02-24 17:43:41,182][17373] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 17:43:41,184][17373] RunningMeanStd input shape: (1,) [2023-02-24 17:43:41,196][17373] ConvEncoder: input_channels=3 [2023-02-24 17:43:41,464][17373] Conv encoder output size: 512 [2023-02-24 17:43:41,464][17373] Policy head output size: 512 [2023-02-24 17:43:41,508][17373] Created Actor Critic model with architecture: [2023-02-24 17:43:41,508][17373] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 17:43:48,714][17373] Using optimizer [2023-02-24 17:43:48,715][17373] No checkpoints found [2023-02-24 17:43:48,715][17373] Did not load from checkpoint, starting from scratch! [2023-02-24 17:43:48,715][17373] Initialized policy 0 weights for model version 0 [2023-02-24 17:43:48,720][17373] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 17:43:48,727][17373] LearnerWorker_p0 finished initialization! [2023-02-24 17:43:48,926][17386] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 17:43:48,927][17386] RunningMeanStd input shape: (1,) [2023-02-24 17:43:48,940][17386] ConvEncoder: input_channels=3 [2023-02-24 17:43:49,045][17386] Conv encoder output size: 512 [2023-02-24 17:43:49,045][17386] Policy head output size: 512 [2023-02-24 17:43:49,556][01148] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 17:43:50,318][01148] Heartbeat connected on Batcher_0 [2023-02-24 17:43:50,325][01148] Heartbeat connected on LearnerWorker_p0 [2023-02-24 17:43:50,340][01148] Heartbeat connected on RolloutWorker_w0 [2023-02-24 17:43:50,346][01148] Heartbeat connected on RolloutWorker_w1 [2023-02-24 17:43:50,350][01148] Heartbeat connected on RolloutWorker_w2 [2023-02-24 17:43:50,351][01148] Heartbeat connected on RolloutWorker_w3 [2023-02-24 17:43:50,359][01148] Heartbeat connected on RolloutWorker_w4 [2023-02-24 17:43:50,361][01148] Heartbeat connected on RolloutWorker_w5 [2023-02-24 17:43:50,367][01148] Heartbeat connected on RolloutWorker_w6 [2023-02-24 17:43:50,372][01148] Heartbeat connected on RolloutWorker_w7 [2023-02-24 17:43:51,352][01148] Inference worker 0-0 is ready! [2023-02-24 17:43:51,353][01148] All inference workers are ready! Signal rollout workers to start! [2023-02-24 17:43:51,356][01148] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 17:43:51,483][17391] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,498][17392] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,505][17393] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,507][17390] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,528][17388] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,532][17394] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,531][17389] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:51,534][17395] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 17:43:52,365][17394] Decorrelating experience for 0 frames... [2023-02-24 17:43:52,366][17392] Decorrelating experience for 0 frames... [2023-02-24 17:43:52,900][17393] Decorrelating experience for 0 frames... [2023-02-24 17:43:52,907][17391] Decorrelating experience for 0 frames... [2023-02-24 17:43:52,921][17395] Decorrelating experience for 0 frames... [2023-02-24 17:43:52,935][17389] Decorrelating experience for 0 frames... [2023-02-24 17:43:53,082][17392] Decorrelating experience for 32 frames... [2023-02-24 17:43:53,185][17388] Decorrelating experience for 0 frames... [2023-02-24 17:43:53,910][17391] Decorrelating experience for 32 frames... [2023-02-24 17:43:53,923][17395] Decorrelating experience for 32 frames... [2023-02-24 17:43:54,008][17393] Decorrelating experience for 32 frames... [2023-02-24 17:43:54,012][17390] Decorrelating experience for 0 frames... [2023-02-24 17:43:54,347][17388] Decorrelating experience for 32 frames... [2023-02-24 17:43:54,462][17392] Decorrelating experience for 64 frames... [2023-02-24 17:43:54,556][01148] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 17:43:55,027][17394] Decorrelating experience for 32 frames... [2023-02-24 17:43:55,064][17390] Decorrelating experience for 32 frames... [2023-02-24 17:43:55,109][17389] Decorrelating experience for 32 frames... [2023-02-24 17:43:55,289][17395] Decorrelating experience for 64 frames... [2023-02-24 17:43:55,294][17393] Decorrelating experience for 64 frames... [2023-02-24 17:43:56,516][17388] Decorrelating experience for 64 frames... [2023-02-24 17:43:56,722][17389] Decorrelating experience for 64 frames... [2023-02-24 17:43:56,906][17390] Decorrelating experience for 64 frames... [2023-02-24 17:43:56,942][17394] Decorrelating experience for 64 frames... [2023-02-24 17:43:56,973][17393] Decorrelating experience for 96 frames... [2023-02-24 17:43:56,986][17392] Decorrelating experience for 96 frames... [2023-02-24 17:43:58,282][17388] Decorrelating experience for 96 frames... [2023-02-24 17:43:58,352][17390] Decorrelating experience for 96 frames... [2023-02-24 17:43:58,366][17394] Decorrelating experience for 96 frames... [2023-02-24 17:43:58,840][17391] Decorrelating experience for 64 frames... [2023-02-24 17:43:59,018][17395] Decorrelating experience for 96 frames... [2023-02-24 17:43:59,558][01148] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 17:43:59,834][17389] Decorrelating experience for 96 frames... [2023-02-24 17:44:00,052][17391] Decorrelating experience for 96 frames... [2023-02-24 17:44:04,556][01148] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 69.3. Samples: 1040. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 17:44:04,563][01148] Avg episode reward: [(0, '1.120')] [2023-02-24 17:44:05,201][17373] Signal inference workers to stop experience collection... [2023-02-24 17:44:05,210][17386] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 17:44:07,697][17373] Signal inference workers to resume experience collection... [2023-02-24 17:44:07,700][17386] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 17:44:09,556][01148] Fps is (10 sec: 1229.0, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 161.3. Samples: 3226. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-24 17:44:09,565][01148] Avg episode reward: [(0, '2.736')] [2023-02-24 17:44:14,556][01148] Fps is (10 sec: 3276.7, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 270.6. Samples: 6766. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-24 17:44:14,558][01148] Avg episode reward: [(0, '3.720')] [2023-02-24 17:44:16,313][17386] Updated weights for policy 0, policy_version 10 (0.0354) [2023-02-24 17:44:19,556][01148] Fps is (10 sec: 3686.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 399.6. Samples: 11988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-24 17:44:19,558][01148] Avg episode reward: [(0, '4.354')] [2023-02-24 17:44:24,556][01148] Fps is (10 sec: 2867.3, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 468.5. Samples: 16398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 17:44:24,564][01148] Avg episode reward: [(0, '4.564')] [2023-02-24 17:44:28,269][17386] Updated weights for policy 0, policy_version 20 (0.0015) [2023-02-24 17:44:29,556][01148] Fps is (10 sec: 3686.4, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 497.9. Samples: 19916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:44:29,559][01148] Avg episode reward: [(0, '4.301')] [2023-02-24 17:44:34,556][01148] Fps is (10 sec: 4915.1, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 602.0. Samples: 27090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:44:34,561][01148] Avg episode reward: [(0, '4.197')] [2023-02-24 17:44:34,572][17373] Saving new best policy, reward=4.197! [2023-02-24 17:44:38,730][17386] Updated weights for policy 0, policy_version 30 (0.0029) [2023-02-24 17:44:39,556][01148] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 705.3. Samples: 31740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:44:39,563][01148] Avg episode reward: [(0, '4.328')] [2023-02-24 17:44:39,580][17373] Saving new best policy, reward=4.328! [2023-02-24 17:44:44,556][01148] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 744.3. Samples: 33494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:44:44,566][01148] Avg episode reward: [(0, '4.328')] [2023-02-24 17:44:49,556][01148] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 159744. Throughput: 0: 850.4. Samples: 39306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:44:49,559][01148] Avg episode reward: [(0, '4.481')] [2023-02-24 17:44:49,561][17373] Saving new best policy, reward=4.481! [2023-02-24 17:44:50,492][17386] Updated weights for policy 0, policy_version 40 (0.0024) [2023-02-24 17:44:54,558][01148] Fps is (10 sec: 4504.9, 60 sec: 3003.6, 300 sec: 2772.6). Total num frames: 180224. Throughput: 0: 951.5. Samples: 46044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:44:54,565][01148] Avg episode reward: [(0, '4.469')] [2023-02-24 17:44:59,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 2808.7). Total num frames: 196608. Throughput: 0: 923.1. Samples: 48304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:44:59,561][01148] Avg episode reward: [(0, '4.467')] [2023-02-24 17:45:02,195][17386] Updated weights for policy 0, policy_version 50 (0.0021) [2023-02-24 17:45:04,556][01148] Fps is (10 sec: 2867.7, 60 sec: 3481.6, 300 sec: 2785.3). Total num frames: 208896. Throughput: 0: 903.2. Samples: 52630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:45:04,561][01148] Avg episode reward: [(0, '4.344')] [2023-02-24 17:45:09,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 2918.4). Total num frames: 233472. Throughput: 0: 942.0. Samples: 58788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:45:09,558][01148] Avg episode reward: [(0, '4.373')] [2023-02-24 17:45:12,360][17386] Updated weights for policy 0, policy_version 60 (0.0022) [2023-02-24 17:45:14,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 2987.7). Total num frames: 253952. Throughput: 0: 936.8. Samples: 62074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:45:14,559][01148] Avg episode reward: [(0, '4.460')] [2023-02-24 17:45:19,556][01148] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 2958.2). Total num frames: 266240. Throughput: 0: 890.3. Samples: 67152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:45:19,559][01148] Avg episode reward: [(0, '4.644')] [2023-02-24 17:45:19,561][17373] Saving new best policy, reward=4.644! [2023-02-24 17:45:24,557][01148] Fps is (10 sec: 2866.8, 60 sec: 3686.3, 300 sec: 2974.9). Total num frames: 282624. Throughput: 0: 877.5. Samples: 71230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:45:24,563][01148] Avg episode reward: [(0, '4.605')] [2023-02-24 17:45:24,578][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2023-02-24 17:45:25,689][17386] Updated weights for policy 0, policy_version 70 (0.0031) [2023-02-24 17:45:29,556][01148] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3031.0). Total num frames: 303104. Throughput: 0: 904.1. Samples: 74178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:45:29,562][01148] Avg episode reward: [(0, '4.469')] [2023-02-24 17:45:34,556][01148] Fps is (10 sec: 4096.6, 60 sec: 3549.9, 300 sec: 3081.8). Total num frames: 323584. Throughput: 0: 927.5. Samples: 81042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:45:34,558][01148] Avg episode reward: [(0, '4.311')] [2023-02-24 17:45:34,768][17386] Updated weights for policy 0, policy_version 80 (0.0030) [2023-02-24 17:45:39,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3090.6). Total num frames: 339968. Throughput: 0: 882.8. Samples: 85768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:45:39,560][01148] Avg episode reward: [(0, '4.384')] [2023-02-24 17:45:44,557][01148] Fps is (10 sec: 2866.8, 60 sec: 3618.1, 300 sec: 3063.1). Total num frames: 352256. Throughput: 0: 876.8. Samples: 87762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:45:44,567][01148] Avg episode reward: [(0, '4.312')] [2023-02-24 17:45:47,867][17386] Updated weights for policy 0, policy_version 90 (0.0031) [2023-02-24 17:45:49,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3106.1). Total num frames: 372736. Throughput: 0: 905.3. Samples: 93368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:45:49,558][01148] Avg episode reward: [(0, '4.217')] [2023-02-24 17:45:54,556][01148] Fps is (10 sec: 4506.2, 60 sec: 3618.2, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 922.2. Samples: 100288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:45:54,563][01148] Avg episode reward: [(0, '4.460')] [2023-02-24 17:45:57,173][17386] Updated weights for policy 0, policy_version 100 (0.0012) [2023-02-24 17:45:59,559][01148] Fps is (10 sec: 4094.6, 60 sec: 3617.9, 300 sec: 3182.2). Total num frames: 413696. Throughput: 0: 908.5. Samples: 102960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:45:59,562][01148] Avg episode reward: [(0, '4.685')] [2023-02-24 17:45:59,565][17373] Saving new best policy, reward=4.685! [2023-02-24 17:46:04,558][01148] Fps is (10 sec: 2866.5, 60 sec: 3618.0, 300 sec: 3155.4). Total num frames: 425984. Throughput: 0: 890.0. Samples: 107206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:46:04,561][01148] Avg episode reward: [(0, '4.637')] [2023-02-24 17:46:09,556][01148] Fps is (10 sec: 3277.9, 60 sec: 3549.9, 300 sec: 3189.0). Total num frames: 446464. Throughput: 0: 929.7. Samples: 113066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:46:09,558][01148] Avg episode reward: [(0, '4.623')] [2023-02-24 17:46:09,587][17386] Updated weights for policy 0, policy_version 110 (0.0023) [2023-02-24 17:46:14,556][01148] Fps is (10 sec: 4506.7, 60 sec: 3618.1, 300 sec: 3248.6). Total num frames: 471040. Throughput: 0: 934.8. Samples: 116244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:46:14,558][01148] Avg episode reward: [(0, '4.418')] [2023-02-24 17:46:19,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 910.8. Samples: 122028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:46:19,558][01148] Avg episode reward: [(0, '4.386')] [2023-02-24 17:46:20,400][17386] Updated weights for policy 0, policy_version 120 (0.0029) [2023-02-24 17:46:24,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3223.9). Total num frames: 499712. Throughput: 0: 899.8. Samples: 126258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:46:24,558][01148] Avg episode reward: [(0, '4.496')] [2023-02-24 17:46:29,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 912.6. Samples: 128830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:46:29,563][01148] Avg episode reward: [(0, '4.770')] [2023-02-24 17:46:29,565][17373] Saving new best policy, reward=4.770! [2023-02-24 17:46:31,780][17386] Updated weights for policy 0, policy_version 130 (0.0024) [2023-02-24 17:46:34,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 540672. Throughput: 0: 935.0. Samples: 135442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:46:34,559][01148] Avg episode reward: [(0, '4.737')] [2023-02-24 17:46:39,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3300.9). Total num frames: 561152. Throughput: 0: 900.5. Samples: 140810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:46:39,559][01148] Avg episode reward: [(0, '4.496')] [2023-02-24 17:46:43,751][17386] Updated weights for policy 0, policy_version 140 (0.0027) [2023-02-24 17:46:44,558][01148] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3276.8). Total num frames: 573440. Throughput: 0: 887.4. Samples: 142892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:46:44,569][01148] Avg episode reward: [(0, '4.638')] [2023-02-24 17:46:49,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 589824. Throughput: 0: 897.8. Samples: 147606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:46:49,559][01148] Avg episode reward: [(0, '4.690')] [2023-02-24 17:46:54,301][17386] Updated weights for policy 0, policy_version 150 (0.0026) [2023-02-24 17:46:54,556][01148] Fps is (10 sec: 4097.0, 60 sec: 3618.1, 300 sec: 3321.1). Total num frames: 614400. Throughput: 0: 916.8. Samples: 154322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:46:54,564][01148] Avg episode reward: [(0, '4.588')] [2023-02-24 17:46:59,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3319.9). Total num frames: 630784. Throughput: 0: 919.7. Samples: 157630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:46:59,562][01148] Avg episode reward: [(0, '4.629')] [2023-02-24 17:47:04,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3318.8). Total num frames: 647168. Throughput: 0: 885.2. Samples: 161864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:47:04,561][01148] Avg episode reward: [(0, '4.660')] [2023-02-24 17:47:07,111][17386] Updated weights for policy 0, policy_version 160 (0.0029) [2023-02-24 17:47:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3317.8). Total num frames: 663552. Throughput: 0: 900.0. Samples: 166756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:47:09,558][01148] Avg episode reward: [(0, '4.673')] [2023-02-24 17:47:14,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3356.7). Total num frames: 688128. Throughput: 0: 918.2. Samples: 170150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:47:14,558][01148] Avg episode reward: [(0, '4.363')] [2023-02-24 17:47:16,496][17386] Updated weights for policy 0, policy_version 170 (0.0012) [2023-02-24 17:47:19,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3354.8). Total num frames: 704512. Throughput: 0: 910.4. Samples: 176408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:47:19,559][01148] Avg episode reward: [(0, '4.366')] [2023-02-24 17:47:24,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3334.0). Total num frames: 716800. Throughput: 0: 884.8. Samples: 180624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:47:24,562][01148] Avg episode reward: [(0, '4.359')] [2023-02-24 17:47:24,593][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth... [2023-02-24 17:47:29,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3332.7). Total num frames: 733184. Throughput: 0: 883.2. Samples: 182632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:47:29,564][01148] Avg episode reward: [(0, '4.559')] [2023-02-24 17:47:29,787][17386] Updated weights for policy 0, policy_version 180 (0.0012) [2023-02-24 17:47:34,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 757760. Throughput: 0: 917.6. Samples: 188898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:47:34,564][01148] Avg episode reward: [(0, '4.767')] [2023-02-24 17:47:39,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3365.8). Total num frames: 774144. Throughput: 0: 905.4. Samples: 195066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:47:39,561][01148] Avg episode reward: [(0, '4.979')] [2023-02-24 17:47:39,669][17373] Saving new best policy, reward=4.979! [2023-02-24 17:47:39,646][17386] Updated weights for policy 0, policy_version 190 (0.0014) [2023-02-24 17:47:44,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3363.9). Total num frames: 790528. Throughput: 0: 876.9. Samples: 197090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:47:44,561][01148] Avg episode reward: [(0, '4.904')] [2023-02-24 17:47:49,556][01148] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3362.1). Total num frames: 806912. Throughput: 0: 873.9. Samples: 201188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:47:49,562][01148] Avg episode reward: [(0, '4.729')] [2023-02-24 17:47:52,216][17386] Updated weights for policy 0, policy_version 200 (0.0021) [2023-02-24 17:47:54,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3377.1). Total num frames: 827392. Throughput: 0: 907.8. Samples: 207606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:47:54,562][01148] Avg episode reward: [(0, '4.759')] [2023-02-24 17:47:59,556][01148] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3391.5). Total num frames: 847872. Throughput: 0: 901.0. Samples: 210696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:47:59,558][01148] Avg episode reward: [(0, '5.115')] [2023-02-24 17:47:59,567][17373] Saving new best policy, reward=5.115! [2023-02-24 17:48:03,924][17386] Updated weights for policy 0, policy_version 210 (0.0019) [2023-02-24 17:48:04,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3373.2). Total num frames: 860160. Throughput: 0: 863.8. Samples: 215278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:48:04,564][01148] Avg episode reward: [(0, '5.070')] [2023-02-24 17:48:09,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3371.3). Total num frames: 876544. Throughput: 0: 868.8. Samples: 219720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:48:09,559][01148] Avg episode reward: [(0, '5.178')] [2023-02-24 17:48:09,565][17373] Saving new best policy, reward=5.178! [2023-02-24 17:48:14,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3385.0). Total num frames: 897024. Throughput: 0: 897.3. Samples: 223012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:48:14,565][01148] Avg episode reward: [(0, '5.134')] [2023-02-24 17:48:14,900][17386] Updated weights for policy 0, policy_version 220 (0.0043) [2023-02-24 17:48:19,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3398.2). Total num frames: 917504. Throughput: 0: 906.8. Samples: 229702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:48:19,559][01148] Avg episode reward: [(0, '5.019')] [2023-02-24 17:48:24,559][01148] Fps is (10 sec: 3685.2, 60 sec: 3617.9, 300 sec: 3395.9). Total num frames: 933888. Throughput: 0: 865.6. Samples: 234020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:48:24,570][01148] Avg episode reward: [(0, '5.113')] [2023-02-24 17:48:27,483][17386] Updated weights for policy 0, policy_version 230 (0.0039) [2023-02-24 17:48:29,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3379.2). Total num frames: 946176. Throughput: 0: 865.1. Samples: 236020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:48:29,559][01148] Avg episode reward: [(0, '5.217')] [2023-02-24 17:48:29,565][17373] Saving new best policy, reward=5.217! [2023-02-24 17:48:34,556][01148] Fps is (10 sec: 3277.8, 60 sec: 3481.6, 300 sec: 3391.8). Total num frames: 966656. Throughput: 0: 902.4. Samples: 241794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:48:34,559][01148] Avg episode reward: [(0, '5.130')] [2023-02-24 17:48:37,481][17386] Updated weights for policy 0, policy_version 240 (0.0017) [2023-02-24 17:48:39,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3418.0). Total num frames: 991232. Throughput: 0: 905.9. Samples: 248372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:48:39,559][01148] Avg episode reward: [(0, '5.303')] [2023-02-24 17:48:39,562][17373] Saving new best policy, reward=5.303! [2023-02-24 17:48:44,562][01148] Fps is (10 sec: 3684.3, 60 sec: 3549.5, 300 sec: 3401.7). Total num frames: 1003520. Throughput: 0: 881.5. Samples: 250368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:48:44,565][01148] Avg episode reward: [(0, '5.472')] [2023-02-24 17:48:44,586][17373] Saving new best policy, reward=5.472! [2023-02-24 17:48:49,556][01148] Fps is (10 sec: 2457.5, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1015808. Throughput: 0: 870.4. Samples: 254448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:48:49,563][01148] Avg episode reward: [(0, '5.614')] [2023-02-24 17:48:49,574][17373] Saving new best policy, reward=5.614! [2023-02-24 17:48:50,786][17386] Updated weights for policy 0, policy_version 250 (0.0026) [2023-02-24 17:48:54,556][01148] Fps is (10 sec: 3278.7, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 1036288. Throughput: 0: 905.2. Samples: 260456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:48:54,558][01148] Avg episode reward: [(0, '5.715')] [2023-02-24 17:48:54,615][17373] Saving new best policy, reward=5.715! [2023-02-24 17:48:59,556][01148] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1060864. Throughput: 0: 899.5. Samples: 263488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:48:59,558][01148] Avg episode reward: [(0, '5.821')] [2023-02-24 17:48:59,566][17373] Saving new best policy, reward=5.821! [2023-02-24 17:49:01,033][17386] Updated weights for policy 0, policy_version 260 (0.0012) [2023-02-24 17:49:04,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1073152. Throughput: 0: 860.7. Samples: 268434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:49:04,559][01148] Avg episode reward: [(0, '5.860')] [2023-02-24 17:49:04,577][17373] Saving new best policy, reward=5.860! [2023-02-24 17:49:09,556][01148] Fps is (10 sec: 2457.5, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 1085440. Throughput: 0: 859.0. Samples: 272674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:49:09,562][01148] Avg episode reward: [(0, '6.007')] [2023-02-24 17:49:09,566][17373] Saving new best policy, reward=6.007! [2023-02-24 17:49:13,768][17386] Updated weights for policy 0, policy_version 270 (0.0031) [2023-02-24 17:49:14,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 1105920. Throughput: 0: 878.0. Samples: 275530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:49:14,564][01148] Avg episode reward: [(0, '6.144')] [2023-02-24 17:49:14,574][17373] Saving new best policy, reward=6.144! [2023-02-24 17:49:19,556][01148] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1130496. Throughput: 0: 895.2. Samples: 282076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:19,558][01148] Avg episode reward: [(0, '6.636')] [2023-02-24 17:49:19,564][17373] Saving new best policy, reward=6.636! [2023-02-24 17:49:24,461][17386] Updated weights for policy 0, policy_version 280 (0.0024) [2023-02-24 17:49:24,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3550.1, 300 sec: 3596.1). Total num frames: 1146880. Throughput: 0: 862.5. Samples: 287186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:24,563][01148] Avg episode reward: [(0, '7.060')] [2023-02-24 17:49:24,576][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000280_1146880.pth... [2023-02-24 17:49:24,739][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2023-02-24 17:49:24,772][17373] Saving new best policy, reward=7.060! [2023-02-24 17:49:29,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1159168. Throughput: 0: 863.9. Samples: 289240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:29,559][01148] Avg episode reward: [(0, '7.066')] [2023-02-24 17:49:29,563][17373] Saving new best policy, reward=7.066! [2023-02-24 17:49:34,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1179648. Throughput: 0: 890.7. Samples: 294530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:49:34,558][01148] Avg episode reward: [(0, '7.734')] [2023-02-24 17:49:34,572][17373] Saving new best policy, reward=7.734! [2023-02-24 17:49:35,986][17386] Updated weights for policy 0, policy_version 290 (0.0020) [2023-02-24 17:49:39,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1204224. Throughput: 0: 912.7. Samples: 301526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:49:39,558][01148] Avg episode reward: [(0, '7.980')] [2023-02-24 17:49:39,560][17373] Saving new best policy, reward=7.980! [2023-02-24 17:49:44,558][01148] Fps is (10 sec: 3685.7, 60 sec: 3550.1, 300 sec: 3582.2). Total num frames: 1216512. Throughput: 0: 900.1. Samples: 303994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:44,563][01148] Avg episode reward: [(0, '8.188')] [2023-02-24 17:49:44,581][17373] Saving new best policy, reward=8.188! [2023-02-24 17:49:48,016][17386] Updated weights for policy 0, policy_version 300 (0.0013) [2023-02-24 17:49:49,563][01148] Fps is (10 sec: 2455.9, 60 sec: 3549.5, 300 sec: 3554.4). Total num frames: 1228800. Throughput: 0: 880.6. Samples: 308068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:49,573][01148] Avg episode reward: [(0, '7.534')] [2023-02-24 17:49:54,556][01148] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1253376. Throughput: 0: 913.6. Samples: 313788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:49:54,565][01148] Avg episode reward: [(0, '7.078')] [2023-02-24 17:49:58,208][17386] Updated weights for policy 0, policy_version 310 (0.0012) [2023-02-24 17:49:59,556][01148] Fps is (10 sec: 4508.6, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1273856. Throughput: 0: 922.1. Samples: 317024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:49:59,563][01148] Avg episode reward: [(0, '7.248')] [2023-02-24 17:50:04,556][01148] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1286144. Throughput: 0: 894.0. Samples: 322306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:50:04,567][01148] Avg episode reward: [(0, '7.548')] [2023-02-24 17:50:09,556][01148] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1298432. Throughput: 0: 866.0. Samples: 326154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:50:09,562][01148] Avg episode reward: [(0, '7.766')] [2023-02-24 17:50:11,854][17386] Updated weights for policy 0, policy_version 320 (0.0020) [2023-02-24 17:50:14,556][01148] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 1318912. Throughput: 0: 881.0. Samples: 328886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:50:14,560][01148] Avg episode reward: [(0, '7.563')] [2023-02-24 17:50:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1343488. Throughput: 0: 908.0. Samples: 335392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:50:19,561][01148] Avg episode reward: [(0, '7.884')] [2023-02-24 17:50:21,578][17386] Updated weights for policy 0, policy_version 330 (0.0026) [2023-02-24 17:50:24,556][01148] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 1355776. Throughput: 0: 864.1. Samples: 340412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:50:24,564][01148] Avg episode reward: [(0, '8.361')] [2023-02-24 17:50:24,580][17373] Saving new best policy, reward=8.361! [2023-02-24 17:50:29,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1372160. Throughput: 0: 854.3. Samples: 342436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:50:29,558][01148] Avg episode reward: [(0, '8.433')] [2023-02-24 17:50:29,565][17373] Saving new best policy, reward=8.433! [2023-02-24 17:50:34,138][17386] Updated weights for policy 0, policy_version 340 (0.0031) [2023-02-24 17:50:34,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1392640. Throughput: 0: 883.7. Samples: 347828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:50:34,562][01148] Avg episode reward: [(0, '8.531')] [2023-02-24 17:50:34,577][17373] Saving new best policy, reward=8.531! [2023-02-24 17:50:39,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3596.2). Total num frames: 1413120. Throughput: 0: 905.3. Samples: 354526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:50:39,559][01148] Avg episode reward: [(0, '9.010')] [2023-02-24 17:50:39,560][17373] Saving new best policy, reward=9.010! [2023-02-24 17:50:44,563][01148] Fps is (10 sec: 3683.7, 60 sec: 3549.5, 300 sec: 3582.2). Total num frames: 1429504. Throughput: 0: 886.1. Samples: 356906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:50:44,569][01148] Avg episode reward: [(0, '9.732')] [2023-02-24 17:50:44,593][17373] Saving new best policy, reward=9.732! [2023-02-24 17:50:45,332][17386] Updated weights for policy 0, policy_version 350 (0.0035) [2023-02-24 17:50:49,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3550.3, 300 sec: 3540.6). Total num frames: 1441792. Throughput: 0: 861.1. Samples: 361056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:50:49,563][01148] Avg episode reward: [(0, '9.676')] [2023-02-24 17:50:54,556][01148] Fps is (10 sec: 3279.1, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1462272. Throughput: 0: 903.5. Samples: 366810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:50:54,560][01148] Avg episode reward: [(0, '9.176')] [2023-02-24 17:50:56,577][17386] Updated weights for policy 0, policy_version 360 (0.0015) [2023-02-24 17:50:59,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1486848. Throughput: 0: 917.7. Samples: 370182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:50:59,559][01148] Avg episode reward: [(0, '9.273')] [2023-02-24 17:51:04,558][01148] Fps is (10 sec: 4095.5, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 1503232. Throughput: 0: 906.1. Samples: 376170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:51:04,563][01148] Avg episode reward: [(0, '9.393')] [2023-02-24 17:51:07,556][17386] Updated weights for policy 0, policy_version 370 (0.0017) [2023-02-24 17:51:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1519616. Throughput: 0: 895.5. Samples: 380708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:51:09,559][01148] Avg episode reward: [(0, '9.681')] [2023-02-24 17:51:14,556][01148] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1540096. Throughput: 0: 914.8. Samples: 383604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:51:14,563][01148] Avg episode reward: [(0, '9.672')] [2023-02-24 17:51:17,492][17386] Updated weights for policy 0, policy_version 380 (0.0018) [2023-02-24 17:51:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1564672. Throughput: 0: 952.1. Samples: 390672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:51:19,559][01148] Avg episode reward: [(0, '9.112')] [2023-02-24 17:51:24,563][01148] Fps is (10 sec: 4092.9, 60 sec: 3754.2, 300 sec: 3596.1). Total num frames: 1581056. Throughput: 0: 927.7. Samples: 396278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:51:24,566][01148] Avg episode reward: [(0, '8.838')] [2023-02-24 17:51:24,585][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth... [2023-02-24 17:51:24,721][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth [2023-02-24 17:51:29,303][17386] Updated weights for policy 0, policy_version 390 (0.0038) [2023-02-24 17:51:29,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1597440. Throughput: 0: 923.9. Samples: 398474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:51:29,559][01148] Avg episode reward: [(0, '9.282')] [2023-02-24 17:51:34,556][01148] Fps is (10 sec: 3689.2, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1617920. Throughput: 0: 959.7. Samples: 404244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:51:34,558][01148] Avg episode reward: [(0, '10.604')] [2023-02-24 17:51:34,572][17373] Saving new best policy, reward=10.604! [2023-02-24 17:51:38,456][17386] Updated weights for policy 0, policy_version 400 (0.0025) [2023-02-24 17:51:39,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3623.9). Total num frames: 1642496. Throughput: 0: 987.7. Samples: 411256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:51:39,559][01148] Avg episode reward: [(0, '11.706')] [2023-02-24 17:51:39,566][17373] Saving new best policy, reward=11.706! [2023-02-24 17:51:44,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3823.4, 300 sec: 3623.9). Total num frames: 1658880. Throughput: 0: 973.3. Samples: 413980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:51:44,558][01148] Avg episode reward: [(0, '12.883')] [2023-02-24 17:51:44,573][17373] Saving new best policy, reward=12.883! [2023-02-24 17:51:49,557][01148] Fps is (10 sec: 2866.8, 60 sec: 3822.8, 300 sec: 3582.2). Total num frames: 1671168. Throughput: 0: 938.0. Samples: 418378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:51:49,562][01148] Avg episode reward: [(0, '12.214')] [2023-02-24 17:51:50,874][17386] Updated weights for policy 0, policy_version 410 (0.0013) [2023-02-24 17:51:54,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3610.0). Total num frames: 1695744. Throughput: 0: 974.4. Samples: 424558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:51:54,558][01148] Avg episode reward: [(0, '13.370')] [2023-02-24 17:51:54,575][17373] Saving new best policy, reward=13.370! [2023-02-24 17:51:59,556][01148] Fps is (10 sec: 4506.3, 60 sec: 3822.9, 300 sec: 3623.9). Total num frames: 1716224. Throughput: 0: 990.0. Samples: 428156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:51:59,558][01148] Avg episode reward: [(0, '14.177')] [2023-02-24 17:51:59,588][17373] Saving new best policy, reward=14.177! [2023-02-24 17:51:59,591][17386] Updated weights for policy 0, policy_version 420 (0.0021) [2023-02-24 17:52:04,556][01148] Fps is (10 sec: 4095.8, 60 sec: 3891.3, 300 sec: 3637.8). Total num frames: 1736704. Throughput: 0: 964.6. Samples: 434078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:52:04,559][01148] Avg episode reward: [(0, '14.299')] [2023-02-24 17:52:04,579][17373] Saving new best policy, reward=14.299! [2023-02-24 17:52:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3596.1). Total num frames: 1748992. Throughput: 0: 938.5. Samples: 438502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 17:52:09,563][01148] Avg episode reward: [(0, '14.342')] [2023-02-24 17:52:09,569][17373] Saving new best policy, reward=14.342! [2023-02-24 17:52:11,940][17386] Updated weights for policy 0, policy_version 430 (0.0015) [2023-02-24 17:52:14,556][01148] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 1773568. Throughput: 0: 956.2. Samples: 441502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:52:14,560][01148] Avg episode reward: [(0, '14.643')] [2023-02-24 17:52:14,569][17373] Saving new best policy, reward=14.643! [2023-02-24 17:52:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1794048. Throughput: 0: 986.1. Samples: 448618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 17:52:19,558][01148] Avg episode reward: [(0, '13.221')] [2023-02-24 17:52:20,459][17386] Updated weights for policy 0, policy_version 440 (0.0020) [2023-02-24 17:52:24,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.7, 300 sec: 3665.6). Total num frames: 1814528. Throughput: 0: 953.2. Samples: 454148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:52:24,558][01148] Avg episode reward: [(0, '13.049')] [2023-02-24 17:52:29,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3623.9). Total num frames: 1826816. Throughput: 0: 943.1. Samples: 456420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:52:29,563][01148] Avg episode reward: [(0, '13.570')] [2023-02-24 17:52:32,637][17386] Updated weights for policy 0, policy_version 450 (0.0021) [2023-02-24 17:52:34,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3651.7). Total num frames: 1851392. Throughput: 0: 974.8. Samples: 462242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:52:34,558][01148] Avg episode reward: [(0, '13.179')] [2023-02-24 17:52:39,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 1871872. Throughput: 0: 994.5. Samples: 469312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:52:39,559][01148] Avg episode reward: [(0, '15.002')] [2023-02-24 17:52:39,620][17373] Saving new best policy, reward=15.002! [2023-02-24 17:52:41,940][17386] Updated weights for policy 0, policy_version 460 (0.0018) [2023-02-24 17:52:44,560][01148] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3665.5). Total num frames: 1888256. Throughput: 0: 973.1. Samples: 471948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:52:44,565][01148] Avg episode reward: [(0, '15.217')] [2023-02-24 17:52:44,655][17373] Saving new best policy, reward=15.217! [2023-02-24 17:52:49,556][01148] Fps is (10 sec: 3276.7, 60 sec: 3891.3, 300 sec: 3651.7). Total num frames: 1904640. Throughput: 0: 939.6. Samples: 476362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:52:49,561][01148] Avg episode reward: [(0, '15.316')] [2023-02-24 17:52:49,566][17373] Saving new best policy, reward=15.316! [2023-02-24 17:52:53,821][17386] Updated weights for policy 0, policy_version 470 (0.0014) [2023-02-24 17:52:54,556][01148] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1925120. Throughput: 0: 979.8. Samples: 482592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:52:54,558][01148] Avg episode reward: [(0, '15.470')] [2023-02-24 17:52:54,570][17373] Saving new best policy, reward=15.470! [2023-02-24 17:52:59,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1949696. Throughput: 0: 991.7. Samples: 486128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:52:59,558][01148] Avg episode reward: [(0, '13.175')] [2023-02-24 17:53:03,322][17386] Updated weights for policy 0, policy_version 480 (0.0030) [2023-02-24 17:53:04,556][01148] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1966080. Throughput: 0: 966.9. Samples: 492128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:53:04,560][01148] Avg episode reward: [(0, '13.372')] [2023-02-24 17:53:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 1982464. Throughput: 0: 944.9. Samples: 496670. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:53:09,563][01148] Avg episode reward: [(0, '12.904')] [2023-02-24 17:53:14,255][17386] Updated weights for policy 0, policy_version 490 (0.0018) [2023-02-24 17:53:14,556][01148] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 2007040. Throughput: 0: 967.5. Samples: 499956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:53:14,558][01148] Avg episode reward: [(0, '13.276')] [2023-02-24 17:53:19,556][01148] Fps is (10 sec: 4915.3, 60 sec: 3959.5, 300 sec: 3721.2). Total num frames: 2031616. Throughput: 0: 997.1. Samples: 507112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:53:19,561][01148] Avg episode reward: [(0, '14.475')] [2023-02-24 17:53:24,486][17386] Updated weights for policy 0, policy_version 500 (0.0018) [2023-02-24 17:53:24,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2048000. Throughput: 0: 959.8. Samples: 512502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:53:24,561][01148] Avg episode reward: [(0, '14.478')] [2023-02-24 17:53:24,574][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth... [2023-02-24 17:53:24,721][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000280_1146880.pth [2023-02-24 17:53:29,558][01148] Fps is (10 sec: 2866.5, 60 sec: 3891.1, 300 sec: 3707.2). Total num frames: 2060288. Throughput: 0: 949.7. Samples: 514684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:53:29,561][01148] Avg episode reward: [(0, '15.191')] [2023-02-24 17:53:34,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 2084864. Throughput: 0: 983.0. Samples: 520598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:53:34,558][01148] Avg episode reward: [(0, '16.484')] [2023-02-24 17:53:34,575][17373] Saving new best policy, reward=16.484! [2023-02-24 17:53:35,347][17386] Updated weights for policy 0, policy_version 510 (0.0017) [2023-02-24 17:53:39,556][01148] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3735.1). Total num frames: 2105344. Throughput: 0: 999.4. Samples: 527564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:53:39,559][01148] Avg episode reward: [(0, '16.805')] [2023-02-24 17:53:39,568][17373] Saving new best policy, reward=16.805! [2023-02-24 17:53:44,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3748.9). Total num frames: 2121728. Throughput: 0: 975.1. Samples: 530006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:53:44,560][01148] Avg episode reward: [(0, '16.580')] [2023-02-24 17:53:46,527][17386] Updated weights for policy 0, policy_version 520 (0.0034) [2023-02-24 17:53:49,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2138112. Throughput: 0: 940.6. Samples: 534454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:53:49,564][01148] Avg episode reward: [(0, '17.671')] [2023-02-24 17:53:49,568][17373] Saving new best policy, reward=17.671! [2023-02-24 17:53:54,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 2158592. Throughput: 0: 978.0. Samples: 540678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:53:54,565][01148] Avg episode reward: [(0, '16.524')] [2023-02-24 17:53:56,611][17386] Updated weights for policy 0, policy_version 530 (0.0017) [2023-02-24 17:53:59,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2183168. Throughput: 0: 982.6. Samples: 544174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:53:59,564][01148] Avg episode reward: [(0, '18.400')] [2023-02-24 17:53:59,569][17373] Saving new best policy, reward=18.400! [2023-02-24 17:54:04,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2199552. Throughput: 0: 951.6. Samples: 549936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:54:04,563][01148] Avg episode reward: [(0, '18.886')] [2023-02-24 17:54:04,585][17373] Saving new best policy, reward=18.886! [2023-02-24 17:54:08,355][17386] Updated weights for policy 0, policy_version 540 (0.0033) [2023-02-24 17:54:09,557][01148] Fps is (10 sec: 2866.8, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 2211840. Throughput: 0: 929.4. Samples: 554328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:54:09,562][01148] Avg episode reward: [(0, '18.468')] [2023-02-24 17:54:14,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2236416. Throughput: 0: 951.8. Samples: 557512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 17:54:14,564][01148] Avg episode reward: [(0, '18.905')] [2023-02-24 17:54:14,579][17373] Saving new best policy, reward=18.905! [2023-02-24 17:54:17,605][17386] Updated weights for policy 0, policy_version 550 (0.0018) [2023-02-24 17:54:19,556][01148] Fps is (10 sec: 4915.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2260992. Throughput: 0: 979.5. Samples: 564676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:54:19,558][01148] Avg episode reward: [(0, '18.068')] [2023-02-24 17:54:24,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2277376. Throughput: 0: 942.3. Samples: 569966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:54:24,563][01148] Avg episode reward: [(0, '18.571')] [2023-02-24 17:54:29,556][01148] Fps is (10 sec: 2867.3, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 2289664. Throughput: 0: 936.6. Samples: 572154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:54:29,562][01148] Avg episode reward: [(0, '18.026')] [2023-02-24 17:54:29,914][17386] Updated weights for policy 0, policy_version 560 (0.0021) [2023-02-24 17:54:34,556][01148] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2314240. Throughput: 0: 968.0. Samples: 578016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:54:34,560][01148] Avg episode reward: [(0, '19.230')] [2023-02-24 17:54:34,571][17373] Saving new best policy, reward=19.230! [2023-02-24 17:54:38,740][17386] Updated weights for policy 0, policy_version 570 (0.0011) [2023-02-24 17:54:39,556][01148] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2334720. Throughput: 0: 987.3. Samples: 585106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:54:39,563][01148] Avg episode reward: [(0, '19.867')] [2023-02-24 17:54:39,582][17373] Saving new best policy, reward=19.867! [2023-02-24 17:54:44,558][01148] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3804.5). Total num frames: 2351104. Throughput: 0: 966.8. Samples: 587684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:54:44,561][01148] Avg episode reward: [(0, '19.276')] [2023-02-24 17:54:49,556][01148] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2367488. Throughput: 0: 936.4. Samples: 592072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:54:49,563][01148] Avg episode reward: [(0, '19.215')] [2023-02-24 17:54:51,232][17386] Updated weights for policy 0, policy_version 580 (0.0019) [2023-02-24 17:54:54,556][01148] Fps is (10 sec: 3687.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2387968. Throughput: 0: 981.5. Samples: 598494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:54:54,560][01148] Avg episode reward: [(0, '19.418')] [2023-02-24 17:54:59,552][17386] Updated weights for policy 0, policy_version 590 (0.0015) [2023-02-24 17:54:59,568][01148] Fps is (10 sec: 4909.5, 60 sec: 3890.4, 300 sec: 3832.0). Total num frames: 2416640. Throughput: 0: 988.9. Samples: 602026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:54:59,575][01148] Avg episode reward: [(0, '17.624')] [2023-02-24 17:55:04,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2433024. Throughput: 0: 960.1. Samples: 607882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:55:04,561][01148] Avg episode reward: [(0, '18.362')] [2023-02-24 17:55:09,556][01148] Fps is (10 sec: 2870.5, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 2445312. Throughput: 0: 943.7. Samples: 612434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:55:09,564][01148] Avg episode reward: [(0, '18.035')] [2023-02-24 17:55:11,800][17386] Updated weights for policy 0, policy_version 600 (0.0029) [2023-02-24 17:55:14,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2469888. Throughput: 0: 966.5. Samples: 615648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:55:14,564][01148] Avg episode reward: [(0, '17.220')] [2023-02-24 17:55:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2490368. Throughput: 0: 992.2. Samples: 622666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:55:19,562][01148] Avg episode reward: [(0, '16.737')] [2023-02-24 17:55:20,907][17386] Updated weights for policy 0, policy_version 610 (0.0021) [2023-02-24 17:55:24,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2506752. Throughput: 0: 948.0. Samples: 627766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:55:24,561][01148] Avg episode reward: [(0, '17.140')] [2023-02-24 17:55:24,572][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000612_2506752.pth... [2023-02-24 17:55:24,757][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth [2023-02-24 17:55:29,556][01148] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2523136. Throughput: 0: 939.5. Samples: 629960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:55:29,558][01148] Avg episode reward: [(0, '17.837')] [2023-02-24 17:55:32,793][17386] Updated weights for policy 0, policy_version 620 (0.0028) [2023-02-24 17:55:34,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2547712. Throughput: 0: 980.8. Samples: 636206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:55:34,559][01148] Avg episode reward: [(0, '16.754')] [2023-02-24 17:55:39,556][01148] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3860.1). Total num frames: 2568192. Throughput: 0: 996.2. Samples: 643322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:55:39,559][01148] Avg episode reward: [(0, '17.806')] [2023-02-24 17:55:42,200][17386] Updated weights for policy 0, policy_version 630 (0.0015) [2023-02-24 17:55:44,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 2584576. Throughput: 0: 970.9. Samples: 645704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 17:55:44,562][01148] Avg episode reward: [(0, '17.224')] [2023-02-24 17:55:49,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2600960. Throughput: 0: 940.2. Samples: 650192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:55:49,559][01148] Avg episode reward: [(0, '16.468')] [2023-02-24 17:55:53,457][17386] Updated weights for policy 0, policy_version 640 (0.0027) [2023-02-24 17:55:54,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2625536. Throughput: 0: 990.4. Samples: 657000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:55:54,558][01148] Avg episode reward: [(0, '16.880')] [2023-02-24 17:55:59,559][01148] Fps is (10 sec: 4913.5, 60 sec: 3891.7, 300 sec: 3887.7). Total num frames: 2650112. Throughput: 0: 1000.1. Samples: 660656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:55:59,570][01148] Avg episode reward: [(0, '16.084')] [2023-02-24 17:56:03,198][17386] Updated weights for policy 0, policy_version 650 (0.0011) [2023-02-24 17:56:04,562][01148] Fps is (10 sec: 3684.2, 60 sec: 3822.6, 300 sec: 3873.8). Total num frames: 2662400. Throughput: 0: 965.3. Samples: 666108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:56:04,570][01148] Avg episode reward: [(0, '16.431')] [2023-02-24 17:56:09,556][01148] Fps is (10 sec: 2868.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2678784. Throughput: 0: 955.4. Samples: 670760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:56:09,559][01148] Avg episode reward: [(0, '16.334')] [2023-02-24 17:56:13,968][17386] Updated weights for policy 0, policy_version 660 (0.0016) [2023-02-24 17:56:14,556][01148] Fps is (10 sec: 4098.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2703360. Throughput: 0: 988.1. Samples: 674422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:56:14,559][01148] Avg episode reward: [(0, '16.236')] [2023-02-24 17:56:19,556][01148] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2727936. Throughput: 0: 1012.3. Samples: 681758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:56:19,559][01148] Avg episode reward: [(0, '16.641')] [2023-02-24 17:56:24,540][17386] Updated weights for policy 0, policy_version 670 (0.0022) [2023-02-24 17:56:24,556][01148] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 2744320. Throughput: 0: 960.9. Samples: 686564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:56:24,559][01148] Avg episode reward: [(0, '17.327')] [2023-02-24 17:56:29,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2760704. Throughput: 0: 959.3. Samples: 688874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:56:29,558][01148] Avg episode reward: [(0, '18.024')] [2023-02-24 17:56:34,556][01148] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2781184. Throughput: 0: 1004.7. Samples: 695404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:56:34,558][01148] Avg episode reward: [(0, '19.020')] [2023-02-24 17:56:34,667][17386] Updated weights for policy 0, policy_version 680 (0.0020) [2023-02-24 17:56:39,562][01148] Fps is (10 sec: 4502.7, 60 sec: 3959.0, 300 sec: 3887.6). Total num frames: 2805760. Throughput: 0: 1008.0. Samples: 702368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:56:39,565][01148] Avg episode reward: [(0, '18.740')] [2023-02-24 17:56:44,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2818048. Throughput: 0: 975.5. Samples: 704552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:56:44,564][01148] Avg episode reward: [(0, '19.196')] [2023-02-24 17:56:46,083][17386] Updated weights for policy 0, policy_version 690 (0.0023) [2023-02-24 17:56:49,556][01148] Fps is (10 sec: 2869.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2834432. Throughput: 0: 951.8. Samples: 708932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:56:49,568][01148] Avg episode reward: [(0, '19.700')] [2023-02-24 17:56:54,556][01148] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2859008. Throughput: 0: 996.1. Samples: 715586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:56:54,559][01148] Avg episode reward: [(0, '19.039')] [2023-02-24 17:56:55,963][17386] Updated weights for policy 0, policy_version 700 (0.0026) [2023-02-24 17:56:59,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3823.2, 300 sec: 3873.9). Total num frames: 2879488. Throughput: 0: 992.0. Samples: 719060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:56:59,561][01148] Avg episode reward: [(0, '19.932')] [2023-02-24 17:56:59,563][17373] Saving new best policy, reward=19.932! [2023-02-24 17:57:04,559][01148] Fps is (10 sec: 3685.2, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 2895872. Throughput: 0: 939.1. Samples: 724022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:04,562][01148] Avg episode reward: [(0, '20.828')] [2023-02-24 17:57:04,572][17373] Saving new best policy, reward=20.828! [2023-02-24 17:57:08,390][17386] Updated weights for policy 0, policy_version 710 (0.0022) [2023-02-24 17:57:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2912256. Throughput: 0: 934.4. Samples: 728610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:09,566][01148] Avg episode reward: [(0, '20.752')] [2023-02-24 17:57:14,556][01148] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2932736. Throughput: 0: 959.2. Samples: 732038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:57:14,563][01148] Avg episode reward: [(0, '21.231')] [2023-02-24 17:57:14,574][17373] Saving new best policy, reward=21.231! [2023-02-24 17:57:17,319][17386] Updated weights for policy 0, policy_version 720 (0.0024) [2023-02-24 17:57:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2957312. Throughput: 0: 970.1. Samples: 739060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:19,566][01148] Avg episode reward: [(0, '21.650')] [2023-02-24 17:57:19,568][17373] Saving new best policy, reward=21.650! [2023-02-24 17:57:24,556][01148] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2969600. Throughput: 0: 915.6. Samples: 743562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:24,559][01148] Avg episode reward: [(0, '21.040')] [2023-02-24 17:57:24,579][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000725_2969600.pth... [2023-02-24 17:57:24,771][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth [2023-02-24 17:57:29,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2985984. Throughput: 0: 915.1. Samples: 745730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:57:29,558][01148] Avg episode reward: [(0, '19.627')] [2023-02-24 17:57:29,678][17386] Updated weights for policy 0, policy_version 730 (0.0026) [2023-02-24 17:57:34,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3010560. Throughput: 0: 965.3. Samples: 752370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:57:34,558][01148] Avg episode reward: [(0, '20.362')] [2023-02-24 17:57:38,509][17386] Updated weights for policy 0, policy_version 740 (0.0019) [2023-02-24 17:57:39,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3755.1, 300 sec: 3873.9). Total num frames: 3031040. Throughput: 0: 963.9. Samples: 758962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:39,558][01148] Avg episode reward: [(0, '22.024')] [2023-02-24 17:57:39,568][17373] Saving new best policy, reward=22.024! [2023-02-24 17:57:44,556][01148] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3047424. Throughput: 0: 934.7. Samples: 761120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:57:44,563][01148] Avg episode reward: [(0, '21.483')] [2023-02-24 17:57:49,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3063808. Throughput: 0: 925.9. Samples: 765684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 17:57:49,563][01148] Avg episode reward: [(0, '21.228')] [2023-02-24 17:57:50,864][17386] Updated weights for policy 0, policy_version 750 (0.0032) [2023-02-24 17:57:54,556][01148] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3088384. Throughput: 0: 979.8. Samples: 772700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:57:54,564][01148] Avg episode reward: [(0, '20.642')] [2023-02-24 17:57:59,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 3108864. Throughput: 0: 983.2. Samples: 776280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:57:59,561][01148] Avg episode reward: [(0, '21.722')] [2023-02-24 17:58:00,633][17386] Updated weights for policy 0, policy_version 760 (0.0019) [2023-02-24 17:58:04,561][01148] Fps is (10 sec: 3275.1, 60 sec: 3754.6, 300 sec: 3859.9). Total num frames: 3121152. Throughput: 0: 931.8. Samples: 780994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:58:04,564][01148] Avg episode reward: [(0, '20.754')] [2023-02-24 17:58:09,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3141632. Throughput: 0: 947.2. Samples: 786188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:58:09,563][01148] Avg episode reward: [(0, '20.690')] [2023-02-24 17:58:11,861][17386] Updated weights for policy 0, policy_version 770 (0.0014) [2023-02-24 17:58:14,556][01148] Fps is (10 sec: 4508.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3166208. Throughput: 0: 978.9. Samples: 789780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 17:58:14,561][01148] Avg episode reward: [(0, '21.240')] [2023-02-24 17:58:19,561][01148] Fps is (10 sec: 4503.4, 60 sec: 3822.6, 300 sec: 3859.9). Total num frames: 3186688. Throughput: 0: 987.1. Samples: 796794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:58:19,574][01148] Avg episode reward: [(0, '21.809')] [2023-02-24 17:58:21,466][17386] Updated weights for policy 0, policy_version 780 (0.0019) [2023-02-24 17:58:24,557][01148] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3203072. Throughput: 0: 942.3. Samples: 801366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:58:24,559][01148] Avg episode reward: [(0, '21.501')] [2023-02-24 17:58:29,556][01148] Fps is (10 sec: 3278.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3219456. Throughput: 0: 947.0. Samples: 803734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 17:58:29,562][01148] Avg episode reward: [(0, '21.593')] [2023-02-24 17:58:32,183][17386] Updated weights for policy 0, policy_version 790 (0.0024) [2023-02-24 17:58:34,556][01148] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3244032. Throughput: 0: 1004.7. Samples: 810896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:58:34,559][01148] Avg episode reward: [(0, '23.697')] [2023-02-24 17:58:34,575][17373] Saving new best policy, reward=23.697! [2023-02-24 17:58:39,559][01148] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 3264512. Throughput: 0: 990.9. Samples: 817292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:58:39,562][01148] Avg episode reward: [(0, '24.531')] [2023-02-24 17:58:39,569][17373] Saving new best policy, reward=24.531! [2023-02-24 17:58:42,668][17386] Updated weights for policy 0, policy_version 800 (0.0011) [2023-02-24 17:58:44,556][01148] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3280896. Throughput: 0: 959.5. Samples: 819458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:58:44,563][01148] Avg episode reward: [(0, '24.368')] [2023-02-24 17:58:49,556][01148] Fps is (10 sec: 3277.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3297280. Throughput: 0: 965.6. Samples: 824440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:58:49,558][01148] Avg episode reward: [(0, '25.154')] [2023-02-24 17:58:49,598][17373] Saving new best policy, reward=25.154! [2023-02-24 17:58:53,108][17386] Updated weights for policy 0, policy_version 810 (0.0017) [2023-02-24 17:58:54,556][01148] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3321856. Throughput: 0: 1008.4. Samples: 831564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:58:54,559][01148] Avg episode reward: [(0, '24.594')] [2023-02-24 17:58:59,559][01148] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 3342336. Throughput: 0: 1007.4. Samples: 835118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:58:59,562][01148] Avg episode reward: [(0, '25.622')] [2023-02-24 17:58:59,567][17373] Saving new best policy, reward=25.622! [2023-02-24 17:59:04,251][17386] Updated weights for policy 0, policy_version 820 (0.0018) [2023-02-24 17:59:04,558][01148] Fps is (10 sec: 3685.8, 60 sec: 3959.7, 300 sec: 3887.7). Total num frames: 3358720. Throughput: 0: 950.4. Samples: 839558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:04,560][01148] Avg episode reward: [(0, '25.114')] [2023-02-24 17:59:09,556][01148] Fps is (10 sec: 3277.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3375104. Throughput: 0: 974.2. Samples: 845204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:59:09,559][01148] Avg episode reward: [(0, '25.109')] [2023-02-24 17:59:13,866][17386] Updated weights for policy 0, policy_version 830 (0.0013) [2023-02-24 17:59:14,556][01148] Fps is (10 sec: 4096.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 999.5. Samples: 848712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:14,558][01148] Avg episode reward: [(0, '26.393')] [2023-02-24 17:59:14,586][17373] Saving new best policy, reward=26.393! [2023-02-24 17:59:19,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3891.5, 300 sec: 3873.8). Total num frames: 3420160. Throughput: 0: 984.8. Samples: 855210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:19,558][01148] Avg episode reward: [(0, '24.575')] [2023-02-24 17:59:24,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3436544. Throughput: 0: 943.8. Samples: 859758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 17:59:24,563][01148] Avg episode reward: [(0, '23.444')] [2023-02-24 17:59:24,578][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth... [2023-02-24 17:59:24,735][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000612_2506752.pth [2023-02-24 17:59:25,651][17386] Updated weights for policy 0, policy_version 840 (0.0020) [2023-02-24 17:59:29,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3457024. Throughput: 0: 954.5. Samples: 862410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 17:59:29,563][01148] Avg episode reward: [(0, '22.609')] [2023-02-24 17:59:34,429][17386] Updated weights for policy 0, policy_version 850 (0.0023) [2023-02-24 17:59:34,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3481600. Throughput: 0: 1004.6. Samples: 869648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:34,564][01148] Avg episode reward: [(0, '22.131')] [2023-02-24 17:59:39,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3887.8). Total num frames: 3497984. Throughput: 0: 979.6. Samples: 875648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:39,562][01148] Avg episode reward: [(0, '21.698')] [2023-02-24 17:59:44,556][01148] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3514368. Throughput: 0: 950.2. Samples: 877872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:44,558][01148] Avg episode reward: [(0, '21.705')] [2023-02-24 17:59:46,581][17386] Updated weights for policy 0, policy_version 860 (0.0011) [2023-02-24 17:59:49,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3534848. Throughput: 0: 974.0. Samples: 883388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:49,558][01148] Avg episode reward: [(0, '22.729')] [2023-02-24 17:59:54,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3874.0). Total num frames: 3559424. Throughput: 0: 1012.4. Samples: 890764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 17:59:54,561][01148] Avg episode reward: [(0, '22.455')] [2023-02-24 17:59:55,107][17386] Updated weights for policy 0, policy_version 870 (0.0017) [2023-02-24 17:59:59,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 3575808. Throughput: 0: 1004.8. Samples: 893928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 17:59:59,559][01148] Avg episode reward: [(0, '23.079')] [2023-02-24 18:00:04,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3592192. Throughput: 0: 963.2. Samples: 898556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:00:04,565][01148] Avg episode reward: [(0, '22.249')] [2023-02-24 18:00:06,919][17386] Updated weights for policy 0, policy_version 880 (0.0024) [2023-02-24 18:00:09,556][01148] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3616768. Throughput: 0: 1001.0. Samples: 904802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 18:00:09,558][01148] Avg episode reward: [(0, '22.306')] [2023-02-24 18:00:14,556][01148] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3641344. Throughput: 0: 1023.3. Samples: 908460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 18:00:14,559][01148] Avg episode reward: [(0, '22.922')] [2023-02-24 18:00:15,380][17386] Updated weights for policy 0, policy_version 890 (0.0026) [2023-02-24 18:00:19,556][01148] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 3657728. Throughput: 0: 1001.4. Samples: 914712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 18:00:19,561][01148] Avg episode reward: [(0, '22.920')] [2023-02-24 18:00:24,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3674112. Throughput: 0: 969.1. Samples: 919256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 18:00:24,560][01148] Avg episode reward: [(0, '23.280')] [2023-02-24 18:00:27,410][17386] Updated weights for policy 0, policy_version 900 (0.0012) [2023-02-24 18:00:29,556][01148] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3694592. Throughput: 0: 984.9. Samples: 922192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:00:29,562][01148] Avg episode reward: [(0, '22.797')] [2023-02-24 18:00:34,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3719168. Throughput: 0: 1019.0. Samples: 929244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 18:00:34,559][01148] Avg episode reward: [(0, '21.667')] [2023-02-24 18:00:36,307][17386] Updated weights for policy 0, policy_version 910 (0.0017) [2023-02-24 18:00:39,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3735552. Throughput: 0: 976.7. Samples: 934716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:00:39,560][01148] Avg episode reward: [(0, '20.678')] [2023-02-24 18:00:44,556][01148] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3747840. Throughput: 0: 955.7. Samples: 936936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 18:00:44,558][01148] Avg episode reward: [(0, '20.615')] [2023-02-24 18:00:48,413][17386] Updated weights for policy 0, policy_version 920 (0.0038) [2023-02-24 18:00:49,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3772416. Throughput: 0: 978.5. Samples: 942590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 18:00:49,558][01148] Avg episode reward: [(0, '20.405')] [2023-02-24 18:00:54,556][01148] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3792896. Throughput: 0: 997.1. Samples: 949672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:00:54,561][01148] Avg episode reward: [(0, '20.880')] [2023-02-24 18:00:58,275][17386] Updated weights for policy 0, policy_version 930 (0.0012) [2023-02-24 18:00:59,556][01148] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.7). Total num frames: 3813376. Throughput: 0: 975.1. Samples: 952338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 18:00:59,558][01148] Avg episode reward: [(0, '20.815')] [2023-02-24 18:01:04,556][01148] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3825664. Throughput: 0: 932.8. Samples: 956688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:01:04,559][01148] Avg episode reward: [(0, '21.752')] [2023-02-24 18:01:09,424][17386] Updated weights for policy 0, policy_version 940 (0.0026) [2023-02-24 18:01:09,556][01148] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3850240. Throughput: 0: 973.3. Samples: 963056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:01:09,564][01148] Avg episode reward: [(0, '21.982')] [2023-02-24 18:01:14,560][01148] Fps is (10 sec: 4503.6, 60 sec: 3822.6, 300 sec: 3873.8). Total num frames: 3870720. Throughput: 0: 988.0. Samples: 966656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:01:14,563][01148] Avg episode reward: [(0, '22.845')] [2023-02-24 18:01:19,555][17386] Updated weights for policy 0, policy_version 950 (0.0018) [2023-02-24 18:01:19,556][01148] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3891200. Throughput: 0: 958.3. Samples: 972368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 18:01:19,563][01148] Avg episode reward: [(0, '21.580')] [2023-02-24 18:01:24,556][01148] Fps is (10 sec: 3278.3, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3903488. Throughput: 0: 939.2. Samples: 976978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 18:01:24,559][01148] Avg episode reward: [(0, '21.398')] [2023-02-24 18:01:24,576][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000953_3903488.pth... [2023-02-24 18:01:24,727][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000725_2969600.pth [2023-02-24 18:01:29,556][01148] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3928064. Throughput: 0: 960.3. Samples: 980150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 18:01:29,562][01148] Avg episode reward: [(0, '22.169')] [2023-02-24 18:01:30,235][17386] Updated weights for policy 0, policy_version 960 (0.0013) [2023-02-24 18:01:34,556][01148] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3952640. Throughput: 0: 995.2. Samples: 987376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:01:34,558][01148] Avg episode reward: [(0, '21.707')] [2023-02-24 18:01:39,560][01148] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3901.6). Total num frames: 3969024. Throughput: 0: 955.0. Samples: 992650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 18:01:39,563][01148] Avg episode reward: [(0, '21.851')] [2023-02-24 18:01:40,838][17386] Updated weights for policy 0, policy_version 970 (0.0015) [2023-02-24 18:01:44,557][01148] Fps is (10 sec: 2866.9, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3981312. Throughput: 0: 945.1. Samples: 994868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 18:01:44,561][01148] Avg episode reward: [(0, '21.829')] [2023-02-24 18:01:49,223][17373] Stopping Batcher_0... [2023-02-24 18:01:49,224][17373] Loop batcher_evt_loop terminating... [2023-02-24 18:01:49,225][01148] Component Batcher_0 stopped! [2023-02-24 18:01:49,230][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 18:01:49,266][17392] Stopping RolloutWorker_w4... [2023-02-24 18:01:49,266][01148] Component RolloutWorker_w4 stopped! [2023-02-24 18:01:49,280][17386] Weights refcount: 2 0 [2023-02-24 18:01:49,283][17392] Loop rollout_proc4_evt_loop terminating... [2023-02-24 18:01:49,285][01148] Component RolloutWorker_w3 stopped! [2023-02-24 18:01:49,289][01148] Component InferenceWorker_p0-w0 stopped! [2023-02-24 18:01:49,285][17386] Stopping InferenceWorker_p0-w0... [2023-02-24 18:01:49,286][17388] Stopping RolloutWorker_w0... [2023-02-24 18:01:49,291][17386] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 18:01:49,294][17394] Stopping RolloutWorker_w6... [2023-02-24 18:01:49,291][01148] Component RolloutWorker_w0 stopped! [2023-02-24 18:01:49,296][17388] Loop rollout_proc0_evt_loop terminating... [2023-02-24 18:01:49,299][17394] Loop rollout_proc6_evt_loop terminating... [2023-02-24 18:01:49,300][01148] Component RolloutWorker_w6 stopped! [2023-02-24 18:01:49,304][17393] Stopping RolloutWorker_w5... [2023-02-24 18:01:49,304][17393] Loop rollout_proc5_evt_loop terminating... [2023-02-24 18:01:49,304][01148] Component RolloutWorker_w5 stopped! [2023-02-24 18:01:49,310][17389] Stopping RolloutWorker_w1... [2023-02-24 18:01:49,310][17389] Loop rollout_proc1_evt_loop terminating... [2023-02-24 18:01:49,310][01148] Component RolloutWorker_w1 stopped! [2023-02-24 18:01:49,291][17391] Stopping RolloutWorker_w3... [2023-02-24 18:01:49,317][17391] Loop rollout_proc3_evt_loop terminating... [2023-02-24 18:01:49,338][17395] Stopping RolloutWorker_w7... [2023-02-24 18:01:49,338][01148] Component RolloutWorker_w7 stopped! [2023-02-24 18:01:49,341][01148] Component RolloutWorker_w2 stopped! [2023-02-24 18:01:49,339][17395] Loop rollout_proc7_evt_loop terminating... [2023-02-24 18:01:49,340][17390] Stopping RolloutWorker_w2... [2023-02-24 18:01:49,353][17390] Loop rollout_proc2_evt_loop terminating... [2023-02-24 18:01:49,420][17373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth [2023-02-24 18:01:49,429][17373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 18:01:49,576][17373] Stopping LearnerWorker_p0... [2023-02-24 18:01:49,578][17373] Loop learner_proc0_evt_loop terminating... [2023-02-24 18:01:49,575][01148] Component LearnerWorker_p0 stopped! [2023-02-24 18:01:49,582][01148] Waiting for process learner_proc0 to stop... [2023-02-24 18:01:51,371][01148] Waiting for process inference_proc0-0 to join... [2023-02-24 18:01:51,767][01148] Waiting for process rollout_proc0 to join... [2023-02-24 18:01:52,181][01148] Waiting for process rollout_proc1 to join... [2023-02-24 18:01:52,183][01148] Waiting for process rollout_proc2 to join... [2023-02-24 18:01:52,185][01148] Waiting for process rollout_proc3 to join... [2023-02-24 18:01:52,189][01148] Waiting for process rollout_proc4 to join... [2023-02-24 18:01:52,191][01148] Waiting for process rollout_proc5 to join... [2023-02-24 18:01:52,194][01148] Waiting for process rollout_proc6 to join... [2023-02-24 18:01:52,195][01148] Waiting for process rollout_proc7 to join... [2023-02-24 18:01:52,197][01148] Batcher 0 profile tree view: batching: 25.7095, releasing_batches: 0.0259 [2023-02-24 18:01:52,199][01148] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 524.9289 update_model: 7.9864 weight_update: 0.0034 one_step: 0.0095 handle_policy_step: 503.3729 deserialize: 14.7592, stack: 2.9412, obs_to_device_normalize: 113.1554, forward: 240.2422, send_messages: 26.3998 prepare_outputs: 80.9816 to_cpu: 50.4132 [2023-02-24 18:01:52,200][01148] Learner 0 profile tree view: misc: 0.0055, prepare_batch: 15.9028 train: 75.4042 epoch_init: 0.0055, minibatch_init: 0.0094, losses_postprocess: 0.5886, kl_divergence: 0.5394, after_optimizer: 32.7933 calculate_losses: 26.6513 losses_init: 0.0046, forward_head: 1.8046, bptt_initial: 17.4902, tail: 1.0352, advantages_returns: 0.2774, losses: 3.4127 bptt: 2.3307 bptt_forward_core: 2.2586 update: 14.1689 clip: 1.4045 [2023-02-24 18:01:52,202][01148] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3231, enqueue_policy_requests: 137.7886, env_step: 813.6080, overhead: 20.0027, complete_rollouts: 7.0781 save_policy_outputs: 19.3052 split_output_tensors: 9.3305 [2023-02-24 18:01:52,203][01148] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2628, enqueue_policy_requests: 146.6016, env_step: 807.1904, overhead: 19.2301, complete_rollouts: 6.8866 save_policy_outputs: 19.1004 split_output_tensors: 9.4689 [2023-02-24 18:01:52,205][01148] Loop Runner_EvtLoop terminating... [2023-02-24 18:01:52,206][01148] Runner profile tree view: main_loop: 1101.8385 [2023-02-24 18:01:52,208][01148] Collected {0: 4005888}, FPS: 3635.6 [2023-02-24 18:01:57,593][01148] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 18:01:57,595][01148] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 18:01:57,598][01148] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 18:01:57,601][01148] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 18:01:57,604][01148] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 18:01:57,606][01148] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 18:01:57,608][01148] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 18:01:57,609][01148] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 18:01:57,611][01148] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-24 18:01:57,612][01148] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-24 18:01:57,613][01148] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 18:01:57,614][01148] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 18:01:57,616][01148] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 18:01:57,617][01148] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 18:01:57,618][01148] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 18:01:57,645][01148] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 18:01:57,647][01148] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 18:01:57,651][01148] RunningMeanStd input shape: (1,) [2023-02-24 18:01:57,667][01148] ConvEncoder: input_channels=3 [2023-02-24 18:01:58,357][01148] Conv encoder output size: 512 [2023-02-24 18:01:58,359][01148] Policy head output size: 512 [2023-02-24 18:02:00,747][01148] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 18:02:02,035][01148] Num frames 100... [2023-02-24 18:02:02,144][01148] Num frames 200... [2023-02-24 18:02:02,257][01148] Num frames 300... [2023-02-24 18:02:02,373][01148] Num frames 400... [2023-02-24 18:02:02,490][01148] Num frames 500... [2023-02-24 18:02:02,602][01148] Num frames 600... [2023-02-24 18:02:02,720][01148] Num frames 700... [2023-02-24 18:02:02,836][01148] Avg episode rewards: #0: 13.530, true rewards: #0: 7.530 [2023-02-24 18:02:02,837][01148] Avg episode reward: 13.530, avg true_objective: 7.530 [2023-02-24 18:02:02,898][01148] Num frames 800... [2023-02-24 18:02:03,007][01148] Num frames 900... [2023-02-24 18:02:03,114][01148] Num frames 1000... [2023-02-24 18:02:03,225][01148] Num frames 1100... [2023-02-24 18:02:03,347][01148] Num frames 1200... [2023-02-24 18:02:03,468][01148] Num frames 1300... [2023-02-24 18:02:03,583][01148] Num frames 1400... [2023-02-24 18:02:03,697][01148] Num frames 1500... [2023-02-24 18:02:03,813][01148] Num frames 1600... [2023-02-24 18:02:03,924][01148] Num frames 1700... [2023-02-24 18:02:04,042][01148] Num frames 1800... [2023-02-24 18:02:04,152][01148] Num frames 1900... [2023-02-24 18:02:04,268][01148] Num frames 2000... [2023-02-24 18:02:04,385][01148] Num frames 2100... [2023-02-24 18:02:04,482][01148] Avg episode rewards: #0: 23.145, true rewards: #0: 10.645 [2023-02-24 18:02:04,484][01148] Avg episode reward: 23.145, avg true_objective: 10.645 [2023-02-24 18:02:04,563][01148] Num frames 2200... [2023-02-24 18:02:04,679][01148] Num frames 2300... [2023-02-24 18:02:04,794][01148] Num frames 2400... [2023-02-24 18:02:04,914][01148] Num frames 2500... [2023-02-24 18:02:05,023][01148] Num frames 2600... [2023-02-24 18:02:05,138][01148] Num frames 2700... [2023-02-24 18:02:05,252][01148] Num frames 2800... [2023-02-24 18:02:05,368][01148] Num frames 2900... [2023-02-24 18:02:05,483][01148] Num frames 3000... [2023-02-24 18:02:05,595][01148] Num frames 3100... [2023-02-24 18:02:05,706][01148] Num frames 3200... [2023-02-24 18:02:05,824][01148] Num frames 3300... [2023-02-24 18:02:05,936][01148] Num frames 3400... [2023-02-24 18:02:06,052][01148] Num frames 3500... [2023-02-24 18:02:06,175][01148] Num frames 3600... [2023-02-24 18:02:06,234][01148] Avg episode rewards: #0: 27.337, true rewards: #0: 12.003 [2023-02-24 18:02:06,235][01148] Avg episode reward: 27.337, avg true_objective: 12.003 [2023-02-24 18:02:06,352][01148] Num frames 3700... [2023-02-24 18:02:06,471][01148] Num frames 3800... [2023-02-24 18:02:06,591][01148] Num frames 3900... [2023-02-24 18:02:06,702][01148] Num frames 4000... [2023-02-24 18:02:06,815][01148] Num frames 4100... [2023-02-24 18:02:06,925][01148] Num frames 4200... [2023-02-24 18:02:07,041][01148] Num frames 4300... [2023-02-24 18:02:07,152][01148] Num frames 4400... [2023-02-24 18:02:07,265][01148] Num frames 4500... [2023-02-24 18:02:07,374][01148] Num frames 4600... [2023-02-24 18:02:07,524][01148] Num frames 4700... [2023-02-24 18:02:07,684][01148] Num frames 4800... [2023-02-24 18:02:07,772][01148] Avg episode rewards: #0: 27.543, true rewards: #0: 12.042 [2023-02-24 18:02:07,773][01148] Avg episode reward: 27.543, avg true_objective: 12.042 [2023-02-24 18:02:07,913][01148] Num frames 4900... [2023-02-24 18:02:08,066][01148] Num frames 5000... [2023-02-24 18:02:08,223][01148] Num frames 5100... [2023-02-24 18:02:08,376][01148] Num frames 5200... [2023-02-24 18:02:08,539][01148] Num frames 5300... [2023-02-24 18:02:08,704][01148] Num frames 5400... [2023-02-24 18:02:08,828][01148] Avg episode rewards: #0: 24.480, true rewards: #0: 10.880 [2023-02-24 18:02:08,833][01148] Avg episode reward: 24.480, avg true_objective: 10.880 [2023-02-24 18:02:08,935][01148] Num frames 5500... [2023-02-24 18:02:09,091][01148] Num frames 5600... [2023-02-24 18:02:09,250][01148] Num frames 5700... [2023-02-24 18:02:09,423][01148] Num frames 5800... [2023-02-24 18:02:09,521][01148] Avg episode rewards: #0: 21.373, true rewards: #0: 9.707 [2023-02-24 18:02:09,523][01148] Avg episode reward: 21.373, avg true_objective: 9.707 [2023-02-24 18:02:09,658][01148] Num frames 5900... [2023-02-24 18:02:09,823][01148] Num frames 6000... [2023-02-24 18:02:09,981][01148] Num frames 6100... [2023-02-24 18:02:10,151][01148] Num frames 6200... [2023-02-24 18:02:10,311][01148] Num frames 6300... [2023-02-24 18:02:10,475][01148] Num frames 6400... [2023-02-24 18:02:10,617][01148] Avg episode rewards: #0: 19.926, true rewards: #0: 9.211 [2023-02-24 18:02:10,618][01148] Avg episode reward: 19.926, avg true_objective: 9.211 [2023-02-24 18:02:10,706][01148] Num frames 6500... [2023-02-24 18:02:10,870][01148] Num frames 6600... [2023-02-24 18:02:11,020][01148] Num frames 6700... [2023-02-24 18:02:11,129][01148] Num frames 6800... [2023-02-24 18:02:11,245][01148] Num frames 6900... [2023-02-24 18:02:11,362][01148] Num frames 7000... [2023-02-24 18:02:11,483][01148] Num frames 7100... [2023-02-24 18:02:11,604][01148] Num frames 7200... [2023-02-24 18:02:11,717][01148] Num frames 7300... [2023-02-24 18:02:11,776][01148] Avg episode rewards: #0: 19.876, true rewards: #0: 9.126 [2023-02-24 18:02:11,778][01148] Avg episode reward: 19.876, avg true_objective: 9.126 [2023-02-24 18:02:11,888][01148] Num frames 7400... [2023-02-24 18:02:12,007][01148] Num frames 7500... [2023-02-24 18:02:12,117][01148] Num frames 7600... [2023-02-24 18:02:12,232][01148] Num frames 7700... [2023-02-24 18:02:12,342][01148] Num frames 7800... [2023-02-24 18:02:12,464][01148] Num frames 7900... [2023-02-24 18:02:12,581][01148] Num frames 8000... [2023-02-24 18:02:12,687][01148] Avg episode rewards: #0: 19.158, true rewards: #0: 8.936 [2023-02-24 18:02:12,690][01148] Avg episode reward: 19.158, avg true_objective: 8.936 [2023-02-24 18:02:12,755][01148] Num frames 8100... [2023-02-24 18:02:12,866][01148] Num frames 8200... [2023-02-24 18:02:12,979][01148] Num frames 8300... [2023-02-24 18:02:13,092][01148] Num frames 8400... [2023-02-24 18:02:13,201][01148] Num frames 8500... [2023-02-24 18:02:13,311][01148] Num frames 8600... [2023-02-24 18:02:13,428][01148] Num frames 8700... [2023-02-24 18:02:13,574][01148] Avg episode rewards: #0: 18.778, true rewards: #0: 8.778 [2023-02-24 18:02:13,576][01148] Avg episode reward: 18.778, avg true_objective: 8.778 [2023-02-24 18:03:05,828][01148] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 18:04:11,552][01148] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 18:04:11,556][01148] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 18:04:11,560][01148] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 18:04:11,563][01148] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 18:04:11,566][01148] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 18:04:11,567][01148] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 18:04:11,568][01148] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 18:04:11,570][01148] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 18:04:11,572][01148] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 18:04:11,573][01148] Adding new argument 'hf_repository'='Arch4ngel/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-24 18:04:11,574][01148] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 18:04:11,576][01148] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 18:04:11,578][01148] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 18:04:11,580][01148] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 18:04:11,581][01148] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 18:04:11,609][01148] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 18:04:11,611][01148] RunningMeanStd input shape: (1,) [2023-02-24 18:04:11,625][01148] ConvEncoder: input_channels=3 [2023-02-24 18:04:11,662][01148] Conv encoder output size: 512 [2023-02-24 18:04:11,664][01148] Policy head output size: 512 [2023-02-24 18:04:11,684][01148] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 18:04:12,109][01148] Num frames 100... [2023-02-24 18:04:12,219][01148] Num frames 200... [2023-02-24 18:04:12,337][01148] Num frames 300... [2023-02-24 18:04:12,450][01148] Num frames 400... [2023-02-24 18:04:12,567][01148] Num frames 500... [2023-02-24 18:04:12,684][01148] Num frames 600... [2023-02-24 18:04:12,795][01148] Num frames 700... [2023-02-24 18:04:12,919][01148] Num frames 800... [2023-02-24 18:04:13,016][01148] Avg episode rewards: #0: 16.320, true rewards: #0: 8.320 [2023-02-24 18:04:13,018][01148] Avg episode reward: 16.320, avg true_objective: 8.320 [2023-02-24 18:04:13,100][01148] Num frames 900... [2023-02-24 18:04:13,216][01148] Num frames 1000... [2023-02-24 18:04:13,330][01148] Num frames 1100... [2023-02-24 18:04:13,445][01148] Num frames 1200... [2023-02-24 18:04:13,555][01148] Num frames 1300... [2023-02-24 18:04:13,683][01148] Num frames 1400... [2023-02-24 18:04:13,794][01148] Num frames 1500... [2023-02-24 18:04:13,918][01148] Num frames 1600... [2023-02-24 18:04:14,083][01148] Avg episode rewards: #0: 17.425, true rewards: #0: 8.425 [2023-02-24 18:04:14,085][01148] Avg episode reward: 17.425, avg true_objective: 8.425 [2023-02-24 18:04:14,107][01148] Num frames 1700... [2023-02-24 18:04:14,220][01148] Num frames 1800... [2023-02-24 18:04:14,338][01148] Num frames 1900... [2023-02-24 18:04:14,453][01148] Num frames 2000... [2023-02-24 18:04:14,571][01148] Num frames 2100... [2023-02-24 18:04:14,687][01148] Num frames 2200... [2023-02-24 18:04:14,799][01148] Num frames 2300... [2023-02-24 18:04:14,925][01148] Num frames 2400... [2023-02-24 18:04:15,043][01148] Num frames 2500... [2023-02-24 18:04:15,187][01148] Avg episode rewards: #0: 17.937, true rewards: #0: 8.603 [2023-02-24 18:04:15,191][01148] Avg episode reward: 17.937, avg true_objective: 8.603 [2023-02-24 18:04:15,218][01148] Num frames 2600... [2023-02-24 18:04:15,329][01148] Num frames 2700... [2023-02-24 18:04:15,449][01148] Num frames 2800... [2023-02-24 18:04:15,562][01148] Num frames 2900... [2023-02-24 18:04:15,677][01148] Num frames 3000... [2023-02-24 18:04:15,784][01148] Num frames 3100... [2023-02-24 18:04:15,902][01148] Num frames 3200... [2023-02-24 18:04:16,015][01148] Num frames 3300... [2023-02-24 18:04:16,128][01148] Num frames 3400... [2023-02-24 18:04:16,254][01148] Num frames 3500... [2023-02-24 18:04:16,363][01148] Num frames 3600... [2023-02-24 18:04:16,459][01148] Avg episode rewards: #0: 19.590, true rewards: #0: 9.090 [2023-02-24 18:04:16,460][01148] Avg episode reward: 19.590, avg true_objective: 9.090 [2023-02-24 18:04:16,534][01148] Num frames 3700... [2023-02-24 18:04:16,645][01148] Num frames 3800... [2023-02-24 18:04:16,755][01148] Num frames 3900... [2023-02-24 18:04:16,870][01148] Num frames 4000... [2023-02-24 18:04:16,994][01148] Num frames 4100... [2023-02-24 18:04:17,106][01148] Num frames 4200... [2023-02-24 18:04:17,218][01148] Num frames 4300... [2023-02-24 18:04:17,328][01148] Num frames 4400... [2023-02-24 18:04:17,445][01148] Num frames 4500... [2023-02-24 18:04:17,557][01148] Num frames 4600... [2023-02-24 18:04:17,682][01148] Num frames 4700... [2023-02-24 18:04:17,798][01148] Num frames 4800... [2023-02-24 18:04:17,923][01148] Num frames 4900... [2023-02-24 18:04:18,035][01148] Num frames 5000... [2023-02-24 18:04:18,147][01148] Num frames 5100... [2023-02-24 18:04:18,264][01148] Num frames 5200... [2023-02-24 18:04:18,382][01148] Num frames 5300... [2023-02-24 18:04:18,546][01148] Avg episode rewards: #0: 23.392, true rewards: #0: 10.792 [2023-02-24 18:04:18,548][01148] Avg episode reward: 23.392, avg true_objective: 10.792 [2023-02-24 18:04:18,559][01148] Num frames 5400... [2023-02-24 18:04:18,708][01148] Num frames 5500... [2023-02-24 18:04:18,859][01148] Num frames 5600... [2023-02-24 18:04:19,024][01148] Num frames 5700... [2023-02-24 18:04:19,184][01148] Num frames 5800... [2023-02-24 18:04:19,339][01148] Num frames 5900... [2023-02-24 18:04:19,503][01148] Num frames 6000... [2023-02-24 18:04:19,656][01148] Num frames 6100... [2023-02-24 18:04:19,821][01148] Num frames 6200... [2023-02-24 18:04:19,890][01148] Avg episode rewards: #0: 22.847, true rewards: #0: 10.347 [2023-02-24 18:04:19,893][01148] Avg episode reward: 22.847, avg true_objective: 10.347 [2023-02-24 18:04:20,043][01148] Num frames 6300... [2023-02-24 18:04:20,203][01148] Num frames 6400... [2023-02-24 18:04:20,356][01148] Num frames 6500... [2023-02-24 18:04:20,510][01148] Num frames 6600... [2023-02-24 18:04:20,670][01148] Num frames 6700... [2023-02-24 18:04:20,827][01148] Num frames 6800... [2023-02-24 18:04:20,991][01148] Num frames 6900... [2023-02-24 18:04:21,152][01148] Num frames 7000... [2023-02-24 18:04:21,320][01148] Num frames 7100... [2023-02-24 18:04:21,485][01148] Num frames 7200... [2023-02-24 18:04:21,650][01148] Num frames 7300... [2023-02-24 18:04:21,809][01148] Num frames 7400... [2023-02-24 18:04:21,974][01148] Num frames 7500... [2023-02-24 18:04:22,147][01148] Num frames 7600... [2023-02-24 18:04:22,271][01148] Num frames 7700... [2023-02-24 18:04:22,395][01148] Num frames 7800... [2023-02-24 18:04:22,510][01148] Num frames 7900... [2023-02-24 18:04:22,646][01148] Avg episode rewards: #0: 25.809, true rewards: #0: 11.380 [2023-02-24 18:04:22,647][01148] Avg episode reward: 25.809, avg true_objective: 11.380 [2023-02-24 18:04:22,688][01148] Num frames 8000... [2023-02-24 18:04:22,808][01148] Num frames 8100... [2023-02-24 18:04:22,920][01148] Num frames 8200... [2023-02-24 18:04:23,040][01148] Num frames 8300... [2023-02-24 18:04:23,161][01148] Num frames 8400... [2023-02-24 18:04:23,269][01148] Avg episode rewards: #0: 23.307, true rewards: #0: 10.557 [2023-02-24 18:04:23,274][01148] Avg episode reward: 23.307, avg true_objective: 10.557 [2023-02-24 18:04:23,338][01148] Num frames 8500... [2023-02-24 18:04:23,451][01148] Num frames 8600... [2023-02-24 18:04:23,573][01148] Num frames 8700... [2023-02-24 18:04:23,685][01148] Num frames 8800... [2023-02-24 18:04:23,843][01148] Avg episode rewards: #0: 21.433, true rewards: #0: 9.878 [2023-02-24 18:04:23,845][01148] Avg episode reward: 21.433, avg true_objective: 9.878 [2023-02-24 18:04:23,860][01148] Num frames 8900... [2023-02-24 18:04:23,974][01148] Num frames 9000... [2023-02-24 18:04:24,095][01148] Num frames 9100... [2023-02-24 18:04:24,211][01148] Num frames 9200... [2023-02-24 18:04:24,320][01148] Num frames 9300... [2023-02-24 18:04:24,437][01148] Num frames 9400... [2023-02-24 18:04:24,554][01148] Num frames 9500... [2023-02-24 18:04:24,674][01148] Num frames 9600... [2023-02-24 18:04:24,785][01148] Num frames 9700... [2023-02-24 18:04:24,901][01148] Num frames 9800... [2023-02-24 18:04:25,012][01148] Num frames 9900... [2023-02-24 18:04:25,134][01148] Num frames 10000... [2023-02-24 18:04:25,253][01148] Num frames 10100... [2023-02-24 18:04:25,362][01148] Num frames 10200... [2023-02-24 18:04:25,511][01148] Avg episode rewards: #0: 22.577, true rewards: #0: 10.277 [2023-02-24 18:04:25,513][01148] Avg episode reward: 22.577, avg true_objective: 10.277 [2023-02-24 18:05:26,077][01148] Replay video saved to /content/train_dir/default_experiment/replay.mp4!